Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

A Hybrid Fully Convolutional CNN-Transformer Model for Inherently Interpretable Disease Detection from Retinal Fundus Images

Created by
  • Haebom

Author

Kerol Djoumessi, Samuel Ofosu Mensah, Philipp Berens

Outline

This paper proposes a hybrid model combining a convolutional neural network (CNN) and a vision transformer (ViT) for interpretability in medical image analysis. To address the interpretability challenges of existing hybrid models, we developed a fully convolutional CNN-transformer architecture that considered interpretability from the design stage. This model was applied to retinal disease detection, achieving superior predictive performance compared to existing black-box and interpretable models. It also generates class-specific sparse evidence maps through a single forward pass. Reproducibility was ensured through open source code.

Takeaways, Limitations

Takeaways:
We present an interpretable CNN-ViT hybrid model in medical image analysis, contributing to understanding the model's decision-making process.
Improves diagnostic reliability by providing class-specific local evidence maps along with superior predictive performance compared to existing models.
Generating efficient evidence maps through a single forward pass.
Ensure reproducibility of research through open code.
Limitations:
The performance evaluation of the proposed model is limited to a specific medical image data set (retinal disease). Verification of generalization performance on other types of medical image data is needed.
The interpretability of the model depends on the presented method and requires comparative analysis with other interpretation methodologies.
Since it is currently specialized in retinal disease detection, further research is needed on its applicability and generalization performance to other medical image analysis problems.
👍