Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

VisionUnite: A Vision-Language Foundation Model for Ophthalmology Enhanced with Clinical Knowledge

Created by
  • Haebom

Author

Zihan Li, Diping Song, Zefeng Yang, Deming Wang, Fei Li, Xiulan Zhang, Paul E. Kinahan, Yu Qiao

Outline

This paper presents VisionUnite, a novel visual-language-based model augmented with clinical knowledge to improve ophthalmic diagnosis in areas with low access to healthcare. VisionUnite is pretrained on 1.24 million image-text pairs and further fine-tuned using the MMFundus dataset, which contains over 290,000 high-quality fundus image-text pairs and over 890,000 simulated doctor-patient conversations. Experimental results show that VisionUnite outperforms existing generative models such as GPT-4V and Gemini Pro, and achieves diagnostic performance comparable to that of a novice ophthalmologist. Its superior performance across a variety of clinical scenarios (e.g., open multi-disease diagnosis, clinical narratives, and patient interactions) suggests its potential as an early ophthalmic disease screening tool and an aid in ophthalmologist training. In conclusion, VisionUnite represents a significant advancement in ophthalmology with broad implications for diagnosis, medical education, and understanding disease mechanisms. The source code is available on GitHub.

Takeaways, Limitations

Takeaways:
It can contribute to improving the accuracy of diagnosing ophthalmic diseases in areas with low medical accessibility.
It can be used to improve the diagnostic ability of junior ophthalmologists and increase educational efficiency.
It can be utilized as a versatile tool applicable to various clinical scenarios.
It can contribute to increasing understanding of rare eye diseases.
Limitations:
Further validation of the generalizability of the dataset used to evaluate the model's performance is needed.
Further research is needed to verify performance and ensure safety in actual clinical environments.
It is necessary to analyze and develop solutions for model errors and biases.
There is a need to improve the explainability of the model's decision-making process.
👍