Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Agentic large language models improve retrieval-based radiology question answering

Created by
  • Haebom

Author

Sebastian Wind, Jeta Sopa, Daniel Truhn, Mahshad Lotfinia, Tri-Thien Nguyen, Keno Bressem, Lisa Adams, Mirabela Rusu, Harald K ostler, Gerhard Wellein, Andreas Maier, Soroosh Tayebi Arasteh

Outline

This paper proposes an agent-based retrieval augmented generation (RAG) framework for radiology question answering (QA). To overcome the limitations of conventional single-step retrieval methods, the proposed framework enables a large-scale language model (LLM) to autonomously decompose radiology questions, iteratively retrieve targeted clinical evidence from Radiopaedia.org, and dynamically synthesize evidence-based responses. Twenty-five LLMs with various architectures, parameter sizes (0.5B to >670B), and training paradigms (general purpose, inference optimization, and clinical fine-tuning) were used to evaluate the framework on 104 expert-curated radiology questions from the RSNA-RadioQA and ExtendedQA datasets and 65 real radiology exam questions. The results show that agent retrieval significantly improves average diagnostic accuracy compared to zero-shot prompting and conventional online RAG, particularly for small models. Furthermore, it significantly improves factual evidence by reducing hallucinations and retrieving clinically relevant context. The benefits of agent retrieval were also observed in clinically fine-tuned models. All datasets, code, and the entire agent framework are publicly available.

Takeaways, Limitations

Takeaways:
We demonstrate that an agent-based RAG framework can improve realism and diagnostic accuracy in radiology QA.
It is particularly effective in improving the performance of small LLMs.
Reduces hallucinations and increases retrieval of clinically relevant information.
Clinically fine-tuned models also provide additional performance improvements.
Supports further research and clinical applications through publicly available datasets and code.
Limitations:
For very large models (>200B parameters), the performance improvement is minimal.
Further validation studies are needed to determine its clinical utility.
👍