To address two challenges hindering the safe application of multimodal large language models (MLLMs) in healthcare—prompt design sensitivity and the generation of incorrect responses with high confidence—we propose Prompt4Trust, a reinforcement learning (RL)-based framework for confidence calibration of MLLMs. Prompt4Trust uses lightweight LLMs to generate auxiliary prompts whose confidence matches the MLLM's predictive accuracy. This method prioritizes confidence calibration in clinical contexts and improves medical image question-answering performance on the PMC-VQA benchmark. Furthermore, Prompt4Trust, trained on a small MLLM, demonstrates zero-shot generalization to larger MLLMs, suggesting the potential for scalable calibration without high computational costs.