This paper addresses two key challenges in applying multimodal large-scale language models (MLLMs) to healthcare: sparse multimodal medical datasets and the reliability of Reinforcement Learning with Verifiable Rewards (RLVR) in healthcare. To achieve this, we integrated high-quality text inference data and general multimodal data with multimodal medical datasets during the Supervised Fine-tuning (SFT) phase to enhance the underlying medical capabilities and restore the model's inference capabilities. Furthermore, considering the sparse multimodal medical dataset, we synthesized a reflective pattern-injected chain-of-thought (CoT) sample in addition to the general CoT sample to provide early reflective inference capabilities. As a result, we developed the InfiMed-SFT-3B and InfiMed-RL-3B models, which achieved the highest performance on seven multimodal healthcare benchmarks. InfiMed-RL-3B achieved an average accuracy of 59.2%, outperforming InternVL3-8B (57.3%).