This paper evaluates an efficient compression technique for multimodal large-scale language models (MLLMs) applicable to healthcare. Specifically, we analyze the impact of structural pruning and activation-aware quantization on a fine-tuned LLAVA model. We also propose a novel layer selection method to evaluate the performance degradation and memory footprint reduction of the pruning-fine-tuning-quantization pipeline. We compress an MLLM with 7 billion parameters to run on 4 GB of VRAM, achieving a 70% reduction in memory footprint and a 4% performance improvement over existing techniques.