Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

From $\mathcal{O}(n^{2})$ to $\mathcal{O}(n)$ Parameters: Quantum Self-Attention in Vision Transformers for Biomedical Image Classification

Created by
  • Haebom

Author

Thomas Boucher, John Whittle, Evangelos B. Mazomenos

Outline

In this paper, we demonstrate that a Quantum Vision Transformer (QViT) with quantum self-attention (QSA) mechanism outperforms the state-of-the-art (SOTA) biomedical image classifier while using 99.99% fewer parameters. We replace the linear self-attention (SA) layer with a parameterized quantum neural network (QNN) to create the QSA mechanism, reducing the parameter size from O(n²) to O(n). On the RetinaMNIST dataset, QViT outperforms 13 out of 14 SOTA methods, including CNNs and ViT, achieving an accuracy of 56.5%, which is 0.88% lower than the state-of-the-art model, MedMamba, which uses 14.5M parameters, yet uses 99.99% fewer parameters (1K vs 14.5M) and 89% fewer GFLOPs. Furthermore, we apply knowledge distillation (KD) from classical vision transformers to quantum vision transformers for the first time to biomedical image classification, showing that QViT improves QSA parameter efficiency while maintaining comparable performance to classical ViT on eight datasets with various modalities. High-qubit architectures benefit more from KD pre-training, suggesting a scaling relationship between QSA parameters and KD effects. These results establish QSA as a practical architecture choice for parameter-efficient biomedical image analysis.

Takeaways, Limitations

Takeaways:
We demonstrate that QViT, leveraging the quantum self-attention mechanism (QSA), can achieve performance comparable to that of existing state-of-the-art biomedical image classification models with extremely few parameters.
We demonstrate that knowledge distillation (KD) from classical vision transformers to quantum vision transformers is effective in improving the performance of QViT.
Presents the scaling relationship between QSA parameters and KD effects.
We present the practicality of QSA as a novel architecture for parameter-efficient biomedical image analysis.
Limitations:
Further research is needed on generalization performance on datasets other than the RetinaMNIST dataset.
Application results to more complex and larger biomedical image datasets are needed.
Further analysis is needed on the computational cost and practicality of implementation of the QSA mechanism.
Experimental results with limited datasets require further validation for generalizability.
👍