Med-RewardBench is the first benchmark specifically designed to evaluate reward models and evaluators for multimodal large-scale language models (MLLMs) in healthcare applications. Featuring a multimodal dataset of 1,026 expert-annotated datasets spanning 13 organ systems and 8 clinical departments, Med-RewardBench undergoes a rigorous three-step process to ensure high-quality evaluation data across six clinically important dimensions. Unlike existing benchmarks that focus on general MLLM features or evaluate models as problem solvers, Med-RewardBench considers essential evaluation dimensions such as diagnostic accuracy and clinical relevance. This study evaluates 32 state-of-the-art MLLMs, including open-source, proprietary, and healthcare-specific models, revealing significant challenges in aligning with expert judgment. Furthermore, we developed a baseline model that significantly improves performance through fine-tuning.