In this paper, we present a novel method for detecting AI-generated music to address copyright and music industry-wide issues arising from the advancement of AI-based music generation tools. To overcome the limitations of existing audio or lyrics-based detectors (generalization and noise vulnerability of audio-based detectors, lack of accurate lyrics data of lyrics-based detectors), we propose a multi-modal, modular post-fusion pipeline that combines automatically transcribed song lyrics with speech features that capture lyrics-related information in audio. This method directly leverages lyric aspects in audio to enhance robustness and mitigate sensitivity to low-level artifacts, thereby increasing practical applicability. Experimental results show that the proposed DE-detect method outperforms existing lyrics-based detectors and is more robust to audio noise. The code is available on GitHub.