This paper provides a comprehensive overview of how well recent deep learning models for music capture emotion and the challenges researchers face. We discuss music emotion datasets, evaluation criteria, and competitions, briefly introduce various music emotion prediction models, and provide insights into the various approaches. We highlight ongoing challenges in accurately capturing emotion in music, including issues related to dataset quality, annotation consistency, and model generalization. We also explore the impact of different modes, such as audio, MIDI, and physiological signals, on the effectiveness of emotion prediction models, and identify ongoing challenges in music emotion recognition (MER), including dataset quality, ambiguity of emotion labels, and difficulties in generalizing across datasets. We argue that standardized benchmarks, larger and more diverse datasets, and improved model interpretability are necessary for future progress in MER, and provide a GitHub repository containing a list of music emotion datasets and recent prediction models.