Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

An Agentic System for Rare Disease Diagnosis with Traceable Reasoning

Created by
  • Haebom

Author

Weike Zhao, Chaoyi Wu, Yanjie Fan, Xiaoman Zhang, Pengcheng Qiu, Yuze Sun, Xiao Zhou, Yanfeng Wang, Xin Sun, Ya Zhang, Yongguo Yu, Kun Sun, Weidi Xie

Outline

DeepRare is a rare disease diagnosis agent system based on a large-scale language model (LLM). It processes diverse clinical input data to rank-order diagnostic hypotheses for rare diseases and transparently displays the reasoning process for each hypothesis. It consists of a central host with a long-term memory module and a specialized agent server that integrates over 40 specialized tools and cutting-edge medical knowledge sources. Its modular and scalable design enables it to perform complex diagnostic inference while maintaining traceability and adaptability. Evaluation results using eight datasets demonstrated 100% accuracy for 1,013 of 2,919 diseases, outperforming 15 other methods (existing bioinformatics diagnostic tools, LLMs, and other agent systems). Notably, its Recall@1 score averaged 57.18%, 23.79 percentage points higher than the second-best method (Reasoning LLM). In a multimodal input scenario, the Recall@1 score was 70.60%, which was higher than that of Exomiser (53.20%), and manual validation of the inference process by clinical experts showed a 95.40% agreement rate. It was implemented as a user-friendly web application (http://raredx.cn/doctor) .

Takeaways, Limitations

Takeaways:
Demonstrating excellent performance of LLM-based rare disease diagnosis system.
Achieved improved accuracy and Recall@1 scores compared to existing methods.
Possibility of processing diverse clinical data (multimodal).
Provides a transparent and traceable diagnostic reasoning process.
Implemented as a user-friendly web application.
Limitations:
Further review of the size and diversity of the evaluation dataset is needed.
Further research is needed to determine generalizability to real-world clinical settings.
Further research is needed on error analysis and improvement measures.
The need for continuous improvement in the accessibility and usability of web applications.
👍