Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

DMS-Net:Dual-Modal Multi-Scale Siamese Network for Binocular Fundus Image Classification

Created by
  • Haebom

Author

Guohao Huo, Zibo Lin, Zitong Wang, Ruiting Dai, Hao Tang

Outline

In this paper, we propose a dual-modal multi-scale Siamese network, DMS-Net, for binocular fundus image classification. DMS-Net extracts deep semantic features from paired fundus images by utilizing a weight-sharing Siamese ResNet-152 backbone. To address issues such as lesion boundary ambiguity and scattered pathological distribution, we introduce a multi-scale context-aware module (MSCAM) that integrates adaptive pooling and attention mechanisms. In addition, we effectively combine global context and local edge features by enhancing cross-modal interactions using spatial-semantic recalibration and bidirectional attention via the dual-modal feature fusion (DMFF) module. When evaluated on the ODIR-5K dataset, DMS-Net achieves state-of-the-art performance with an accuracy of 82.9%, recall of 84.5%, and Cohen's kappa of 83.2%, demonstrating a superior ability to detect symmetric pathologies and advance clinical decision-making for ophthalmic diseases.

Takeaways, Limitations

Takeaways:
Improving the accuracy of ophthalmic disease diagnosis by considering the correlation between binocular fundus images.
Solving lesion boundary ambiguity and scattered pathological distribution problems with MSCAM and DMFF modules.
Achieves state-of-the-art performance on ODIR-5K dataset (82.9% accuracy, 84.5% recall, 83.2% Cohen's kappa).
Contributing to the diagnosis of ophthalmic diseases and supporting clinical decision making.
Limitations:
Only performance validation on the ODIR-5K dataset is presented, so generalization performance on other datasets is uncertain.
Computational cost and potential resource consumption due to the complexity of the model.
Further studies are needed to generalize performance across a range of ophthalmic conditions.
👍