Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

MindVL: Towards Efficient and Effective Training of Multimodal Large Language Models on Ascend NPUs

Created by
  • Haebom

Author

Feilong Chen, Yijiang Liu, Yi Huang, Hao Wang, Miren Tian, Ya-Qi Yu, Minghui Liao, Jihao Wu

MindVL: A Multimodal Large-Scale Language Model Based on Ascend NPUs

Outline

This paper proposes MindVL, a multimodal large-scale language model (MLLM) trained on Ascend NPUs. MindVL addresses the limited hardware platform dependency and closed data recipe issues of existing MLLM training. It supports stable and high-performance training of large-scale Dense and Mixture-of-Experts (MoE) models on Ascend hardware through an efficient training framework called MindSpeed-MLLM. Furthermore, it provides a systematic and open description of the training data generation method and mixing strategy. MindVL is a data-efficient MLLM trained end-to-end on Ascend NPUs. It improves performance by averaging the weights of checkpoints trained with various sequence lengths and by employing a test-time resolution search technique. MindVL-8B achieves the same performance as Qwen2.5VL-7B with 10% of the data, and MindVL-671B-A37B, an MoE model, shows similar performance with 3% of the data of Qwen2.5VL-72B.

Takeaways, Limitations

Takeaways:
Presenting Ascend hardware as a valid alternative for MLLM training.
Promoting reproducibility and openness in research by providing open data recipes.
Presents effective performance enhancement techniques such as checkpoint weighted averaging and test time resolution exploration.
Achieve competitive performance with less data through data-efficient model training.
Limitations:
The paper may lack detailed information about the specific dataset size or model architecture.
Comprehensive comparisons with other cutting-edge models and extensive benchmark results may not be sufficient.
As this is a training framework specifically designed for Ascend NPUs, generalizability to other hardware environments may be limited.
There may be a lack of analysis of the model's practical applicability and application to various real-world problems.
👍