This paper proposes DMS-Net, a novel deep learning model based on binocular fundus images for retinal disease diagnosis. DMS-Net is based on the Siamese ResNet-152 architecture, which simultaneously processes fundus images from both eyes and considers pathological correlations. The model introduces the OmniPool Spatial Integrator Module (OSIM), which utilizes multi-scale adaptive pooling and spatial attention mechanisms to address unclear lesion boundaries and diffuse pathology distribution. Furthermore, the Calibrated Analogous Semantic Fusion Module (CASFM) is used to enhance the interaction between binocular images and aggregate modality-independent representations. Furthermore, the Cross-Modal Contrastive Alignment Module (CCAM) and the Cross-Modal Integrative Alignment Module (CIAM) enhance the aggregation of discriminatory and lesion-correlated semantic information between the left and right fundus images. When evaluated on the ODIR-5K dataset, DMS-Net achieved state-of-the-art performance with 82.9% accuracy, 84.5% recall, and 83.2% Cohen's kappa coefficient.