Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Is ChatGPT-5 Ready for Mammogram VQA?

Created by
  • Haebom

Author

Qiang Li, Shansong Wang, Mingzhe Hu, Mojtaba Safari, Zachary Eidex, Xiaofeng Yang

Outline

This paper systematically evaluated the performance of the GPT-5 family and GPT-4o models on four publicly available mammography datasets (EMBED, InBreast, CMMD, and CBIS-DDSM) for BI-RADS assessment, anomaly detection, and malignancy classification. While GPT-5 outperformed other GPT models, it fell short of human experts and domain-specific fine-tuned models. On each dataset, GPT-5 demonstrated significant performance in various breast tissue types (dense, distorted, mass, and microcalcification) and malignancy classification, but its sensitivity and specificity were lower than those of human experts. The significant performance improvement from GPT-4o to GPT-5 suggests the potential of large-scale language models (LLMs) to support mammography VQA tasks in the future.

Takeaways, Limitations

Takeaways:
We demonstrate the potential of large-scale language models, including GPT-5, to be applied to mammography VQA tasks.
The performance improvement of GPT-5 compared to GPT-4o suggests the potential for advancement of LLM.
Presenting the possibility of supporting mammography image interpretation and clinical reasoning.
Limitations:
GPT-5's performance falls short of human expert levels.
Low sensitivity and specificity make it difficult to apply to high-risk clinical imaging applications.
Adaptation and optimization for specific areas are required.
👍