Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Controlling Multimodal LLMs via Reward-guided Decoding

Created by
  • Haebom

Author

Oscar Ma nas, Pierluca D'Oro, Koustuv Sinha, Adriana Romero-Soriano, Michal Drozdzal, Aishwarya Agrawal

Outline

This paper presents a study on adapting multimodal large-scale language models (MLLMs) to diverse user needs using reward-based decoding. Specifically, we propose a method to guide the decoding process of MLLMs by constructing two separate reward models (controlling object precision and recall) to enhance visual perception. This allows for a trade-off between object precision and recall, and between test-time computation and visual perception in image captioning tasks, by controlling the relative importance and search depth of the reward functions during decoding. Evaluation results on standard object hallucination benchmarks demonstrate superior performance and controllability compared to existing hallucination mitigation methods.

Takeaways, Limitations

Takeaways:
We present a novel method for effectively controlling the decoding process of MLLM, allowing it to be tailored to various user needs.
Suggesting the possibility of qualitatively improving image caption generation by independently controlling object precision and recall.
Enables an efficient trade-off between test time computational complexity and visual-based accuracy.
Increasing the practicality of MLLM application through superior performance compared to existing hallucination mitigation methods.
Limitations:
Further research is needed to determine the generalization performance of the proposed method. Applicability to various MLLMs and tasks needs to be verified.
The complexity of reward model design and the lack of a detailed analysis of the optimization process are lacking. Further explanation of the reward model design and training process is needed.
Only evaluation results for specific benchmarks are presented, so generalization performance to other datasets or tasks is uncertain.
👍