This paper addresses the multimodal task of remote sensing image change description. We point out the limitations of existing deep learning-based methods that focus on complex network module design and rely on empirical experiments and iterative network tuning, and propose a new paradigm based on diffusion models that utilize data distribution learning. The main components are a multi-scale change detection module, a diffusion model, and a frequency-guided complex filter module for high-frequency noise management. We experimentally verify that the proposed method outperforms existing methods on several remote sensing datasets.