Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

Multimodal Medical Image Classification via Synergistic Learning Pre-training

Created by
  • Haebom

Author

Qinghua Lin, Guang-Hai Liu, Zuoyong Li, Yang Li, Yuting Jiang, Xiang Wu

Outline

This paper proposes a novel semi-supervised learning-based multimodal medical image classification method based on a "pre-training + fine-tuning" framework to address the modality fusion problem of multimodal medical images in situations where expert annotation data is scarce. A synergistic learning pre-training framework that integrates consistency, reconstruction, and alignment learning allows us to treat one modality as an augmented sample of another modality, thereby performing self-supervised learning and enhancing the feature representation capability of the base model. We then design a fine-tuning method for multimodal fusion, utilizing modality-specific feature extractors and multimodal fusion feature extractors. To mitigate prediction uncertainty and overfitting risks due to the lack of labeled data, we propose a distribution shift method for multimodal fusion features. Experimental results using the Kvasir and Kvasirv2 gastrointestinal endoscopy image datasets demonstrate that the proposed method outperforms existing state-of-the-art classification methods. The source code will be made available on GitHub.

Takeaways, Limitations

Takeaways:
A novel method to effectively address the lack of label data in multimodal medical image classification.
Improving the model's feature representation capability through a synergy learning pre-training framework.
Proposing an effective fine-tuning and distribution shift method for multimodal fusion.
Achieving state-of-the-art performance on the Kvasir and Kvasirv2 datasets
Reproducibility and further research possible through open code
Limitations:
There is a possibility that the performance of the proposed method may be limited to certain datasets.
Validation of generalization performance across various medical image modalities and disease types is needed.
Limitations of the self-supervised learning method used in pre-training may affect its performance.
Experiments using larger datasets are needed.
👍