Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Rationale-guided Prompting for Knowledge-based Visual Question Answering

Created by
  • Haebom

Author

Zhongjian Hu, Peng Yang, Bing Li, Fengyuan Liu

Outline

This paper explores the use of large-scale language models (LLMs) in knowledge-based visual question answering (VQA). Unlike previous studies that directly induce LLMs to predict answers, this paper proposes a novel framework, PLRH, that leverages rationale heuristics, an intermediate reasoning process. PLRH uses Chains of Thinking (CoT) to guide LLMs to generate rationale heuristics, which are then used to predict answers. Experimental results show that PLRH outperforms existing baseline models by 2.2 and 2.1 points, respectively, in OK-VQA and A-OKVQA.

Takeaways, Limitations

Takeaways:
Demonstrates the effectiveness of an approach that utilizes intermediate thought processes to maximize the potential of the LLM.
We demonstrate the superiority of PLRH, a novel framework combining CoT and inferential heuristics in knowledge-based VQA.
Achieves performance improvement over existing methods on OK-VQA and A-OKVQA datasets.
Limitations:
Further research is needed on the generalization performance of the proposed method.
It is necessary to verify whether performance improvements for a specific dataset can be applied to other datasets.
Further analysis is needed to determine the interpretability and reliability of the heuristic generation process.
👍