This page organizes papers related to artificial intelligence published around the world. This page is summarized using Google Gemini and is operated on a non-profit basis. The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.
Agentic large language models improve retrieval-based radiology question answering
Created by
Haebom
Author
Sebastian Wind, Jeta Sopa, Daniel Truhn, Mahshad Lotfinia, Tri-Thien Nguyen, Keno Bressem, Lisa Adams, Mirabela Rusu, Harald K ostler, Gerhard Wellein, Andreas Maier, Soroosh Tayebi Arasteh
Outline
RaR (Radiology Retrieval and Reasoning) is a multi-stage retrieval and reasoning framework designed to improve the diagnostic accuracy, factual consistency, and clinical reliability of LLMs in radiology question answering (QA). RaR overcomes the limitations of existing single-stage retrieval methods and was evaluated on 25 different LLMs (0.5B to >670B parameters) using RSNA-RadioQA, ExtendedQA, and internal radiology board exam question datasets, demonstrating improved diagnostic accuracy and reduced hallucinations.
Takeaways, Limitations
•
Takeaways:
◦
RaR significantly improved the average diagnostic accuracy compared to zero-shot prompting and conventional online RAG.
◦
RaR showed the greatest performance improvements, especially on small models.
◦
RaR provided factual evidence by reducing hallucinations and searching for clinically relevant contexts.
◦
Even in clinically fine-tuned models, additional benefits were achieved with RaR.
•
Limitations:
◦
For large models (>200B parameters), the performance improvement was minimal.
◦
Further studies are needed to confirm its clinical utility.