Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

OmniGen2: Exploration to Advanced Multimodal Generation

Created by
  • Haebom

Author

Chenyuan Wu, Pengfei Zheng, Ruiran Yan, Shitao Xiao, Xin Luo, Yueze Wang, Wanli Li, Xiyan Jiang, Yexin Liu, Junjie Zhou, Ze Liu, Ziyi

Outline

OmniGen2 is a multi-purpose open source generative model that provides a unified solution for various generative tasks, including text-to-image generation, image editing, and in-context generation. Unlike OmniGen v1, it features two separate decoding paths with unshared parameters and separate image tokenizers for text and image modalities. This design allows OmniGen2 to improve performance while maintaining the original text generation capability without having to re-adapt the VAE input based on existing multimodal understanding models. To facilitate the training of OmniGen2, we developed a comprehensive data construction pipeline that includes image editing and in-context generation data. In addition, we introduced a reflection mechanism tailored to image generation tasks and curated a dedicated reflection dataset based on OmniGen2. Despite its relatively small parameter size, OmniGen2 achieves competitive results on multiple task benchmarks, including text-to-image and image editing. To further evaluate the generation in context (topic-oriented tasks), we introduce a new benchmark called OmniContext, and OmniGen2 achieves state-of-the-art performance among open-source models in terms of consistency. We will release the model, training code, dataset, and data construction pipeline to support future research in this area.

Takeaways, Limitations

Takeaways:
Providing integrated solutions for a variety of production tasks.
Efficient learning possible by leveraging existing multi-modal understanding models.
Introducing a reflection mechanism for image generation and building a dedicated dataset.
Achieving state-of-the-art consistency performance among open source models.
Supporting research through open access to models, code, datasets, and pipelines.
Limitations:
Potential performance limitations due to relatively small parameter sizes.
Further validation of the generality and reliability of the new benchmark OmniContext is needed.
There may be a lack of quantitative analysis on how OmniGen2's performance compares to other state-of-the-art models.
👍