This paper proposes EmoVoice, a novel TTS model capable of controlling emotional expression. EmoVoice leverages a large-scale language model (LLM) to enable free and granular natural language emotion control. Furthermore, inspired by Chain of Thought (CoT) and Chain of Modality (CoM) techniques, it enhances content consistency by designing a phoneme boosting variant that outputs phoneme tokens and audio tokens in parallel. We also introduce EmoVoice-DB, a high-quality 40-hour English emotional dataset containing expressive speech, detailed emotional labels, and natural language descriptions. EmoVoice achieves state-of-the-art performance on the English EmoVoice-DB test set using only synthetic training data, and on the Chinese Secap test set using our own data. Furthermore, we investigate the reliability of existing emotional assessment metrics and their alignment with human perceptual preferences, and evaluate emotional speech using GPT-4o-audio and Gemini, two state-of-the-art multimodal LLMs. The dataset, code, checkpoints, and demo samples are available on GitHub.