Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

Compression Strategies for Efficient Multimodal LLMs in Medical Contexts

Created by
  • Haebom

Author

Tanvir A. Khan, Aranya Saha, Ismam N. Swapnil, Mohammad A. Haque

Outline

This paper evaluates an efficient compression technique for multimodal large-scale language models (MLLMs) applicable to healthcare. Specifically, we analyze the impact of structural pruning and activation-aware quantization on a fine-tuned LLAVA model. We also propose a novel layer selection method to evaluate the performance degradation and memory footprint reduction of the pruning-fine-tuning-quantization pipeline. We compress an MLLM with 7 billion parameters to run on 4 GB of VRAM, achieving a 70% reduction in memory footprint and a 4% performance improvement over existing techniques.

Takeaways, Limitations

Takeaways:
The potential of MLLM in the medical field has been increased through efficient compression techniques.
The proposed layer selection method and quantization technique showed better performance than existing methods.
Improved accessibility by enabling MLLM to run even in memory-constrained environments.
Limitations:
The generalizability of the methodology specific to the LLAVA model to other MLLM models requires further study.
There is a lack of clear description of the type and size of the medical dataset used for evaluation.
Further validation is needed to determine whether a 4% performance improvement is meaningful across all medical applications.
👍