This paper proposes AdaLLaVA, an adaptive inference framework for efficient inference of multimodal large-scale language models (MLLMs). Conventional MLLMs struggle to deploy in resource-constrained environments due to their high computational costs. AdaLLaVA addresses this challenge through a learning-based approach that dynamically reconfigures MLLM computations during inference, taking into account input data and latency budgets. Through various benchmark experiments, including question answering, inference, and hallucination, we demonstrate that AdaLLaVA effectively meets input latency budgets and achieves various trade-offs between accuracy and latency depending on execution time. Furthermore, we demonstrate that AdaLLaVA adapts to input latency and content, integrates with token selection to enhance efficiency, and generalizes to a variety of MLLMs.