This paper presents AImoclips, a benchmark for evaluating the emotional expressiveness of text-to-music (TTM) systems. Six state-of-the-art TTM systems were used to generate over 1,000 music clips based on 12 emotional intents, and 111 participants were asked to rate each clip's valence and arousal on a 9-point Likert scale. Experimental results showed that commercial systems tended to produce more pleasant music than intended, while open-source systems exhibited the opposite trend. All systems conveyed emotions more accurately when in a high-arousal state, and all systems exhibited a bias toward emotional neutrality. AImoclips provides insight into the emotional expressive characteristics of each model and supports the future development of emotionally congruent TTM systems.