Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

NExT-OMNI: Towards Any-to-Any Omnimodal Foundation Models with Discrete Flow Matching

Created by
  • Haebom

Author

Run Luo, Xiaobo Xia, Lu Wang, Longze Chen, Renke Shan, Jing Luo, Min Yang, Tat-Seng Chua

Outline

This paper highlights that next-generation multimodal-based models capable of all types of two-way and multi-turn interactions will become a core element of artificial general intelligence systems. To overcome the limitations of existing models, this paper introduces NExT-OMNI, an open-source omnimodal-based model that achieves unified modeling through the discrete flow paradigm. NExT-OMNI leverages metric-guided probabilistic paths and optimal motion velocities to support all types of understanding and generation. Its concise, unified representation, rather than a task-separated design, makes it applicable to a wide range of scenarios. Trained on large-scale text, image, video, and audio data, NExT-OMNI demonstrates competitive performance on multimodal generation and understanding benchmarks and outperforms previous unified models in multi-turn multimodal interaction and cross-modal retrieval. To facilitate research, we open-source the training details, data protocol, code, and model checkpoints.

Takeaways, Limitations

Takeaways:
Native support and improved response efficiency for all types of understanding and creation.
Applying a wide range of scenarios through concise, integrated representations instead of task-separated designs
Competitive performance in multimodal generation and comprehension benchmarks
Outperforms previous integrated models in multi-turn multimodal interaction and cross-modal search.
Open-source provision of code, model checkpoints, training details, and data protocols.
Limitations:
There is no Limitations specified in the paper.
👍