This is a page that curates AI-related papers published worldwide. All content here is summarized using Google Gemini and operated on a non-profit basis. Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.
In this paper, we present a UniCombine framework that enables consistent image generation by effectively combining various conditions (text prompts, spatial maps, reference images, etc.) with the goal of improving the controllability of diffusion models in the field of image generation. UniCombine is based on DiT (Diffusion with Image Transformation) and introduces a novel Conditional MMDiT Attention mechanism and a learnable LoRA (Low-Rank Adaptation) module, so that it can operate without learning or with learning. In addition, we conduct experiments on a new dataset SubjectSpatial200K that includes various conditions and show that it achieves state-of-the-art performance.
Takeaways, Limitations
•
Takeaways:
◦
A new framework, UniCombine, is presented to control image generation by effectively combining various conditions (text, spatial information, images, etc.)
◦
Increase efficiency with LoRA-based implementation of UniCombine that can operate without learning
◦
A new dataset SubjectSpatial200K for multi-condition image generation released
◦
Present experimental results showing superior performance over existing methods
•
Limitations:
◦
The size of the SubjectSpatial200K dataset needs to be further expanded in the future.
◦
Further research is needed into consistency issues that may arise when combining different conditions.
◦
Further research is needed on the generalization performance of the proposed framework to other diffusion models or other types of conditional inputs.