Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

UniCombine: Unified Multi-Conditional Combination with Diffusion Transformer

Created by
  • Haebom

Author

Haoxuan Wang, Jinlong Peng, Qingdong He, Hao Yang, Ying Jin, Jiafu Wu, Xiaobin Hu, Yanjie Pan, Zhenye Gan, Mingmin Chi, Bo Peng, Yabiao Wang

Outline

In this paper, we present a UniCombine framework that enables consistent image generation by effectively combining various conditions (text prompts, spatial maps, reference images, etc.) with the goal of improving the controllability of diffusion models in the field of image generation. UniCombine is based on DiT (Diffusion with Image Transformation) and introduces a novel Conditional MMDiT Attention mechanism and a learnable LoRA (Low-Rank Adaptation) module, so that it can operate without learning or with learning. In addition, we conduct experiments on a new dataset SubjectSpatial200K that includes various conditions and show that it achieves state-of-the-art performance.

Takeaways, Limitations

Takeaways:
A new framework, UniCombine, is presented to control image generation by effectively combining various conditions (text, spatial information, images, etc.)
Increase efficiency with LoRA-based implementation of UniCombine that can operate without learning
A new dataset SubjectSpatial200K for multi-condition image generation released
Present experimental results showing superior performance over existing methods
Limitations:
The size of the SubjectSpatial200K dataset needs to be further expanded in the future.
Further research is needed into consistency issues that may arise when combining different conditions.
Further research is needed on the generalization performance of the proposed framework to other diffusion models or other types of conditional inputs.
👍