To overcome the limitations of the existing SMILES-based molecular generation diffusion model, which only supports single-modal constraints, this paper proposes Cross-Modality Controlled Molecule Generation with Diffusion Language Model (CMCM-DLM), which supports multi-modal constraints and the addition of new constraints. CMCM-DLM applies constraints of various modalities, such as molecular structures and chemical properties, stepwise by adding a Structure Control Module (SCM) and a Property Control Module (PCM) to a pre-trained diffusion model. The SCM establishes the molecular skeleton in the early stage, and the PCM fine-tunes the chemical properties of the generated molecules to target values in the later stage. Experimental results demonstrate the efficiency and adaptability of CMCM-DLM, suggesting a significant advancement in molecule generation in the field of new drug discovery.