Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Are Vision Transformer Representations Semantically Meaningful? A Case Study in Medical Imaging

Created by
  • Haebom

Author

Montasir Shams, Chashi Mahiul Islam, Shaeke Salman, Phat Tran, Xiuwen Liu

Outline

In this paper, we show that the Vision Transformer (ViT), which shows excellent accuracy in medical image classification, has a semantically unclear representation due to its size and complex self-attention mechanism. Using a projected gradient-based algorithm, we show that the ViT representation is semantically fragile and sensitive to subtle changes. That is, images with imperceptible differences may have very different representations, while images that should belong to semantically different classes may have nearly identical representations. This vulnerability reduces the reliability of the classification results, and we show that even a slight change can decrease the classification accuracy by more than 60%. This is the first study to systematically demonstrate the semantic insufficiencies of the ViT representation in medical image classification, and presents important challenges for the application of ViT in safety-critical systems.

Takeaways, Limitations

Takeaways: The first systematic investigation of semantic vulnerabilities in the application of ViT to medical image classification, suggesting difficulties in applying it to safety-critical systems. The need for research on improving ViT models and securing safety by considering sensitivity to subtle changes is raised.
Limitations: Results dependent on a specific algorithm (projected gradient-based algorithm). Generalizability to other medical image types or ViT architectures needs to be verified. Lack of specific solutions to address semantic vulnerabilities.
👍