This paper presents a novel medical assistance system for deploying large-scale language models (LLMs) in resource-constrained environments, such as real-time healthcare. Optimized using a general-purpose compression framework, the system tailors LLMs to specific domains. By measuring neuron importance on domain-specific data, it aggressively removes irrelevant neurons, reducing model size while maintaining performance. Post-training quantization is then applied to further reduce memory usage, and the compressed models are evaluated on healthcare benchmarks including MedMCQA, MedQA, and PubMedQA. Furthermore, we deploy a 50% compressed Gemma model and a 67% compressed LLaMA3 model on a Jetson Orin Nano and a Raspberry Pi 5, achieving real-time, energy-efficient inference under hardware constraints.