Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

Citrus-V: Advancing Medical Foundation Models with Unified Medical Image Grounding for Clinical Reasoning

Created by
  • Haebom

Author

Guoxin Wang, Jun Zhao, Xinyi Liu, Yanbo Liu, Xuyang Cao, Chao Li, Zhuoyun Liu, Qintian Sun, Fangru Zhou, Haoqiang Xing, Zhenhong Yang

Outline

Citrus-V is a multimodal medical-based model that combines medical image analysis and text inference. It integrates detection, segmentation, and multimodal thought-chain inference to enable pixel-level lesion localization, structured report generation, and physician-level diagnostic inference in a single framework. It proposes a novel multimodal learning approach and releases a curated open-source dataset covering inference, detection, segmentation, and document understanding tasks. It outperforms existing open-source medical models and expert-level imaging systems across multiple benchmarks, providing an integrated pipeline from visual evidence to clinical inference, enabling accurate lesion quantification, automated reporting, and a reliable second opinion.

Takeaways, Limitations

Takeaways:
It enables integrated performance of various tasks (detection, segmentation, and inference) of medical image analysis in a single framework.
It outperforms existing open-source models and expert systems, enabling accurate and reliable medical diagnosis.
It contributes to increasing medical work efficiency by identifying lesion locations at the pixel level and generating automatic reports.
Contribute to the advancement of medical AI research by releasing curated open-source datasets.
Limitations:
The paper did not specifically mention Limitations. Further validation and performance evaluation on a large dataset are expected to be necessary for practical clinical applications. Furthermore, further research on the model's explanatory power and reliability may be necessary.
👍