Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

Does Bigger Mean Better? Comparative Analysis of CNNs and Biomedical Vision Language Modles in Medical Diagnosis

Created by
  • Haebom

Author

Ran Tong, Jiaqi Liu, Su Liu, Jiexi Xu, Lanruo Wang, Tong Wang

Outline

This paper compares and analyzes a supervised lightweight CNN with a state-of-the-art zero-shot medical Vision-Language Model (VLM), BiomedCLIP, in an automated approach to accurately interpret chest X-ray images. We perform two diagnostic tasks: pneumonia detection on the PneumoniaMNIST benchmark and tuberculosis detection on the Shenzhen TB dataset. Experimental results show that the supervised CNN serves as a competitive baseline in both cases. While the VLM initially performs poorly at zero-shot, we demonstrate that adjusting the decision threshold significantly improves its performance. For pneumonia detection, the adjusted zero-shot VLM achieves an F1-score of 0.8841, outperforming the supervised CNN's F1-score of 0.8803. For tuberculosis detection, the adjustment significantly improves the F1-score from 0.4812 to 0.7684, approaching the supervised baseline's F1-score of 0.7834. This study highlights that appropriate calibration is essential to leverage the full diagnostic capabilities of zero-shot VLMs, enabling them to achieve performance equivalent to or better than efficient task-specific supervised learning models.

Takeaways, Limitations

Takeaways:
Supervised learning CNN serves as a powerful baseline in chest X-ray imaging diagnosis.
The performance of zero-shot VLM can be significantly improved by adjusting the decision threshold.
Calibrated zero-shot VLMs can outperform supervised CNNs on certain tasks.
Proper calibration is critical to maximizing the diagnostic capabilities of a zero-shot VLM.
Limitations:
There is no Limitations specified in the paper.
👍