Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

MRFD: Multi-Region Fusion Decoding with Self-Consistency for Mitigating Hallucinations in LVLMs

Created by
  • Haebom

Author

Haonan Ge, Yiwei Wang, Ming-Hsuan Yang, Yujun Cai

Outline

Large-scale visual-language models (LVLMs) demonstrate powerful performance in multimodal tasks, but they tend to generate hallucinations that are inconsistent with the visual input. This is due to their limited ability to verify information from other regions of an image. To address this, this paper proposes Multi-Region Fusion Decoding (MRFD), a training-free decoding method that enhances realism by modeling consistency across regions. MRFD uses a cross-attention mechanism to identify salient regions, generates initial responses for each region, and computes confidence weights based on Jensen-Shannon Divergence (JSD). These weights guide consistency-aware fusion of region-specific predictions using region-aware prompts inspired by Chain-of-Thought inference. Experimental results using multiple LVLMs and benchmarks demonstrate that MRFD significantly reduces hallucinations and improves response realism without requiring model updates.

Takeaways, Limitations

Takeaways:
A novel decoding method that solves the hallucination problem of LVLM without training is presented.
Proven effective in various LVLMs and benchmarks
Improve the authenticity of responses
Limitations:
Specific Limitations is not mentioned in the paper (based on the summary)
👍