Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Training-Free Text-Guided Color Editing with Multi-Modal Diffusion Transformer

Created by
  • Haebom

Author

Zixin Yin, Xili Dai, Ling-Hao Chen, Deyu Zhou, Jianan Wang, Duomin Wang, Gang Yu, Lionel M. Ni, Heung-Yeung Shum

Outline

This paper presents ColorCtrl, a novel method for accurate and consistent color editing without training, to solve text-based color editing problems in images and videos. ColorCtrl leverages the attention mechanism of the multi-modal diffusion transformer (MM-DiT) to separate structure and color, and manipulates attention maps and value tokens to enable accurate and consistent color editing and word-level attribute intensity control. ColorCtrl modifies only the regions specified by the prompt, leaving irrelevant regions intact. It outperforms existing training-free methods on SD3 and FLUX.1-dev. In particular, it outperforms commercial models such as FLUX.1 Kontext Max and GPT-4o Image Generation in consistency, and it extends to video models such as CogVideoX to improve temporal consistency and editing stability. It also generalizes to instruction-based editing diffusion models such as Step1X-Edit and FLUX.1 Kontext dev, demonstrating its versatility.

Takeaways, Limitations

Takeaways:
Enables accurate and consistent text-based image and video color editing without training.
Achieve superior editing quality and consistency compared to existing training-free methods and commercial models.
Control word-level attribute strength.
Modify only the specified area and leave unrelated areas as is.
Applicable to various image and video editing models.
Improved temporal consistency and editing stability when editing video.
Limitations:
The paper does not explicitly mention the specific Limitations. Further research may be needed to improve performance and overcome limitations.
👍