Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

AndesVL Technical Report: An Efficient Mobile-side Multimodal Large Language Model

Created by
  • Haebom

Author

Zhiwei Jin, Xiaohui Song, Nan Wang, Yafei Liu, Chao Li, Yuqing Qiu, Ke Chen, Zixian Li, Chi Xie, Huafei Li, Chenxing Li, Chuangchuang Wang, Kai Tang, Zhiguang Zhu, Kai Tang, Wenmei Gao, Rui Wang, Jun Wu, Chao Liu, Qin Xie, Chen Chen, Haonan Lu

AndesVL: Mobile-Side MLLMs for Efficient Visual Understanding

Outline

This paper introduces AndesVL, a mobile-friendly MLLM with 0.6B to 4B parameters based on Qwen3's LLM and various visual encoders. AndesVL achieves best-in-class performance on various open-source benchmarks, including text-rich image understanding, reasoning and mathematics, multi-image understanding, general VQA, hallucination mitigation, multilingual understanding, and GUI-related tasks. Its 1+N LoRA architecture and Quantization-Aware LoRA Fine-Tuning (QALFT) framework enable efficient task adaptation and model compression, and using the OKV cache eviction algorithm and custom speculative decoding and compression strategies, we achieve up to 6.7x decoding speedup, up to 30.9% memory reduction, and 1.8 bits-per-weight when deploying AndesVL-4B on a MediaTek Dimensity 9500 chip.

Takeaways, Limitations

We present the AndesVL model, architecture, training pipeline, and training dataset for efficient MLLM deployment in mobile environments.
Demonstrated superior performance compared to similar models in various benchmarks.
Efficient task adaptation and model compression via 1+N LoRA architecture and QALFT framework.
Performance optimizations through OKV cache eviction algorithm, custom speculative decoding, and compression strategies.
Improved memory usage and decoding speed when deploying models.
The specific Limitations of the paper is not provided. (Information on Limitations is missing from the paper abstract.)
👍