This paper proposes MI-ND, a novel model for denoising medical images. MI-ND integrates multi-scale convolution and transformer architectures, and introduces a noise-level estimator (NLE) and a noise-adaptive attention module (NAAB) to achieve noise-aware channel-spatial attention control and cross-modal feature fusion. Experimental results using various publicly available datasets demonstrate that the proposed method significantly outperforms comparable methods in image quality metrics such as PSNR, SSIM, and LPIPS, and improves F1 scores and ROC-AUC in subsequent diagnostic tasks, demonstrating its practical value and potential. It also demonstrates outstanding performance in structural recovery, diagnostic sensitivity, and cross-modal robustness, offering an effective solution for medical image enhancement and AI-based diagnosis and treatment.