Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Learning to Inference Adaptively for Multimodal Large Language Models

Created by
  • Haebom

Author

Zhuoyan Xu, Khoi Duc Nguyen, Preeti Mukherjee, Saurabh Bagchi, Somali Chaterji, Yingyu Liang, Yin Li

Outline

This paper proposes AdaLLaVA, an adaptive inference framework for efficient inference of multimodal large-scale language models (MLLMs). Conventional MLLMs struggle to deploy in resource-constrained environments due to their high computational costs. AdaLLaVA addresses this challenge through a learning-based approach that dynamically reconfigures MLLM computations during inference, taking into account input data and latency budgets. Through various benchmark experiments, including question answering, inference, and hallucination, we demonstrate that AdaLLaVA effectively meets input latency budgets and achieves various trade-offs between accuracy and latency depending on execution time. Furthermore, we demonstrate that AdaLLaVA adapts to input latency and content, integrates with token selection to enhance efficiency, and generalizes to a variety of MLLMs.

Takeaways, Limitations

Takeaways:
Presenting a new method for efficiently utilizing MLLM even in resource-constrained environments.
Dynamically adjust MLLM operations based on input data and latency budget to achieve optimal performance.
Integration with token selection presents the potential for further efficiency gains.
A general framework applicable to various MLLMs.
Limitations:
AdaLLaVA's performance may vary depending on the MLLM used and the benchmark dataset.
Further validation of generalization performance in real-world environments is needed.
Processing performance for very complex questions or images may require further research.
👍