Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

An Autoencoder and Vision Transformer-based Interpretability Analysis of the Differences in Automated Staging of Second and Third Molars

Created by
  • Haebom

Author

Barkin Buyukcakir, Jannick De Tobel, Patrick Thevissen, Dirk Vandermeulen, Peter Claes

Outline

This paper presents a framework to address the "black box" nature of deep learning models, which limits the practical adoption of deep learning in high-stakes applications such as dental evaluation. Using the performance gap observed in automatic staging of mandibular second molars (tooth 37) and third molars (tooth 38) as a case study, we propose a framework combining a convolutional autoencoder (AE) and a vision transformer (ViT). This framework improves classification accuracy for both teeth compared to the baseline ViT model, increasing it from 0.712 to 0.815 for tooth 37 and from 0.462 to 0.543 for tooth 38. Beyond the performance gains, analysis of the AE's latent space metrics and image reconstruction reveals that the performance gap is data-driven, with the high intraclass morphological variability of the tooth 38 dataset being a key limitation. It highlights the inadequacy of relying on a single interpretation method, such as attention maps, and provides a powerful tool to support expert decision-making by improving accuracy and providing sources of model uncertainty.

Takeaways, Limitations

Takeaways:
Improving the performance and interpretability of deep learning models in dental evaluation through a framework combining convolutional autoencoders and vision transformers.
We present a novel method for identifying dataset issues (e.g., high within-class morphological variability) and identifying sources of model uncertainty through latent space analysis of AE.
Point out the limitations of a single interpretation method (e.g., attention map) and emphasize the importance of a multifaceted interpretation approach.
Improving the reliability of deep learning models and suggesting the possibility of expert support in high-risk applications.
Limitations:
The performance improvement of the proposed framework may be limited to certain datasets (Teeth 37, 38).
Further research is needed to determine generalizability to other types of teeth or other forensic applications.
Additional data collection or data augmentation techniques are needed to address the high intraclass morphological variability in the Tooth 38 dataset.
The possibility of subjectivity in the interpretation of AE's latent space analysis exists.
👍