Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

Agentic large language models improve retrieval-based radiology question answering

Created by
  • Haebom

Author

Sebastian Wind, Jeta Sopa, Daniel Truhn, Mahshad Lotfinia, Tri-Thien Nguyen, Keno Bressem, Lisa Adams, Mirabela Rusu, Harald K ostler, Gerhard Wellein, Andreas Maier, Soroosh Tayebi Arasteh

Outline

RaR (Radiology Retrieval and Reasoning) is a multi-stage retrieval and reasoning framework designed to improve the diagnostic accuracy, factual consistency, and clinical reliability of LLMs in radiology question answering (QA). RaR overcomes the limitations of existing single-stage retrieval methods and was evaluated on 25 different LLMs (0.5B to >670B parameters) using RSNA-RadioQA, ExtendedQA, and internal radiology board exam question datasets, demonstrating improved diagnostic accuracy and reduced hallucinations.

Takeaways, Limitations

Takeaways:
RaR significantly improved the average diagnostic accuracy compared to zero-shot prompting and conventional online RAG.
RaR showed the greatest performance improvements, especially on small models.
RaR provided factual evidence by reducing hallucinations and searching for clinically relevant contexts.
Even in clinically fine-tuned models, additional benefits were achieved with RaR.
Limitations:
For large models (>200B parameters), the performance improvement was minimal.
Further studies are needed to confirm its clinical utility.
👍