This paper presents ColorCtrl, a novel method for accurate and consistent color editing without training, to solve text-based color editing problems in images and videos. ColorCtrl leverages the attention mechanism of the multi-modal diffusion transformer (MM-DiT) to separate structure and color, and manipulates attention maps and value tokens to enable accurate and consistent color editing and word-level attribute intensity control. ColorCtrl modifies only the regions specified by the prompt, leaving irrelevant regions intact. It outperforms existing training-free methods on SD3 and FLUX.1-dev. In particular, it outperforms commercial models such as FLUX.1 Kontext Max and GPT-4o Image Generation in consistency, and it extends to video models such as CogVideoX to improve temporal consistency and editing stability. It also generalizes to instruction-based editing diffusion models such as Step1X-Edit and FLUX.1 Kontext dev, demonstrating its versatility.