This paper comprehensively reviews evaluation methods for music generation systems, which have recently attracted considerable attention, from a multidisciplinary perspective. It broadly examines common evaluation objectives, methodologies, and metrics used to assess both system output and model usage, encompassing subjective and objective approaches, qualitative and quantitative approaches, and empirical and computational methods. The strengths and limitations of each approach are analyzed from musicology, engineering, and HCI perspectives.