This paper highlights the incomplete and limited emotional control of existing Text-to-Speech (TTS) systems and proposes EmoSteer-TTS, a novel method that enables fine-tuned voice emotion control (transformation, interpolation, and deletion) without training. EmoSteer-TTS effectively alters the emotional tone of synthesized speech by modifying the internal activations of a flow-matching-based TTS model. We develop an efficient, training-free algorithm that includes activation extraction, emotion token retrieval, and inference-time steering, making it compatible with various pre-trained models. By constructing an emotional speech dataset from diverse speakers, we derive effective steering vectors. Experimental results demonstrate fine-tunable, interpretable, and continuous voice emotion control that outperforms existing state-of-the-art (SOTA) performance. This is the first method to achieve fine-tuned continuous emotional control without training.