Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

VeOmni: Scaling Any Modality Model Training with Model-Centric Distributed Recipe Zoo

Created by
  • Haebom

Author

Qianli Ma, Yaowei Zheng, Zhelun Shi, Zhongkai Zhao, Bin Jia, Ziyue Huang, Zhiqi Lin, Youjie Li, Jiacheng Yang, Yanghua Peng, Zhi Zhang, Xin Liu

Outline

This paper highlights that training omnimodal Large Language Models (LLMs) remains a significant challenge due to the heterogeneous model architectures required to handle diverse modalities, necessitating sophisticated system design for large-scale training. Existing frameworks typically intertwine model definition and parallel logic, limiting the scalability and engineering overhead of end-to-end omnimodal training. In this paper, we present VeOmni, a modular and efficient training framework for accelerating omnimodal LLM development. VeOmni introduces model-centric distributed recipes that decouple communication from computation, enabling efficient 3D parallel processing in omnimodal LLMs. It also provides a flexible configuration interface that allows seamless integration of new modalities with minimal code changes. We demonstrate that using VeOmni, an omnimodal Mixture-of-Experts (MoE) model with 30B parameters can be trained at 2,800 tokens/second/GPU throughput and scale to 160K context lengths with 3D parallelism on 128 GPUs. This demonstrates excellent efficiency and scalability for large-scale omnimodal LLM training.

Takeaways, Limitations

Takeaways:
We present VeOmni, a novel framework that significantly improves the efficiency and scalability of omnimodal LLM training by decoupling model definition and communication.
Enabling large-scale omnimodal LLM training through 3D parallel processing.
Easy integration of new modalities through a flexible configuration interface.
Experimental results demonstrate VeOmni's superior performance and scalability.
Limitations:
Further research is needed on the practical applications of VeOmni and its generalizability to various omnimodal LLM architectures.
Possibly optimized for a specific hardware environment, requires verification of portability to other hardware environments.
Further experiments and analysis are needed to determine the efficiency and stability of training on very large models.
👍