Large-scale visual-language models (LVLMs) demonstrate powerful performance in multimodal tasks, but they tend to generate hallucinations that are inconsistent with the visual input. This is due to their limited ability to verify information from other regions of an image. To address this, this paper proposes Multi-Region Fusion Decoding (MRFD), a training-free decoding method that enhances realism by modeling consistency across regions. MRFD uses a cross-attention mechanism to identify salient regions, generates initial responses for each region, and computes confidence weights based on Jensen-Shannon Divergence (JSD). These weights guide consistency-aware fusion of region-specific predictions using region-aware prompts inspired by Chain-of-Thought inference. Experimental results using multiple LVLMs and benchmarks demonstrate that MRFD significantly reduces hallucinations and improves response realism without requiring model updates.