Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Clinically Grounded Agent-based Report Evaluation: An Interpretable Metric for Radiology Report Generation

Created by
  • Haebom

Author

Radhika Dua (Fred), Young Joon (Fred), Kwon, Siddhant Dogra, Daniel Freedman, Diana Ruan, Motaz Nashawaty, Danielle Rigau, Daniel Alexander Alber, Kang Zhang, Kyunghyun Cho, Eric Karl Oermann

Outline

This paper proposes ICARE, an interpretable clinical assessment framework for the secure distribution of radiology report generators (RRGs). ICARE leverages a large-scale language model agent and dynamic multiple-choice question answering (MCQA) to generate clinically meaningful questions and assess agreement between two agents, one with a correct report and one with a generated report. Answer agreement serves as an interpretable proxy for clinical precision and recall, linking the score to question-answer pairs to enable transparent and interpretable assessment. Clinical studies show that ICARE significantly outperforms existing metrics in expert judgment. Confounding analysis demonstrates sensitivity to clinical context and reproducibility, and model comparisons reveal interpretable error patterns.

Takeaways, Limitations

Takeaways:
ICARE presents an evaluation framework for generating interpretable radiological image reports (RRGs).
Shows a higher correlation with expert judgment than conventional black-box indicators.
Provide transparent and interpretable assessments through question-answer pairs and score associations.
Interpretable error pattern analysis possible through model comparison
Limitations:
Further research is needed on the generalization performance of ICARE.
Applicability verification is needed for various types of radiological imaging and clinical environments.
Possible limitations due to reliance on large-scale language model agents
👍