Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Seeing the Trees for the Forest: Rethinking Weakly-Supervised Medical Visual Grounding

Created by
  • Haebom

Author

Ta Duc Huy, Duy Anh Huynh, Yutong Xie, Yuankai Qi, Qi Chen, Phi Le Nguyen, Sen Kim Tran, Son Lam Phung, Anton van den Hengel, Zhibin Liao, Minh-Son To, Yohan W.

Outline

This paper aims to improve the accuracy of Visual Grounding (VG) in medical images. We demonstrate that existing Visual Language Models (VLMs) struggle to connect disease regions with textual descriptions due to inefficient attention mechanisms and coarse token representations. Specifically, we experimentally demonstrate that they assign high norms to background tokens, thereby distracting attention from disease regions, and that global tokens underrepresent local disease tokens. To address this issue, we propose a simple yet effective Disease-Aware Prompting (DAP) technique that leverages the explainability map of VLMs to enhance disease-related regions and suppress background interference. We demonstrate a 20.74% improvement in VG accuracy compared to state-of-the-art methods on three major thoracic X line datasets without additional pixel-level annotations.

Takeaways, Limitations

Takeaways:
We reveal that excessive focus on background tokens in VLMs and lack of local disease token representation are the main causes of poor VG performance.
We demonstrate that the DAP technique can significantly improve the performance of medical image VG without additional annotation.
The DAP technique is simple and effective, and can contribute to improving model interpretability and reliability in the field of medical image analysis.
Limitations:
The effectiveness of the DAP technique may be limited to certain types of medical images (thoracic X lines) and VLMs.
Generalization performance needs to be verified for other medical image types or diseases.
DAP performance may be affected by the quality of the explainability map.
👍