Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Multi-Agents Based on Large Language Models for Knowledge-based Visual Question Answering

Created by
  • Haebom

Author

Zhongjian Hu, Peng Yang, Bing Li, Zhenqi Wang

Outline

This paper proposes a multi-agent voting framework to address the problems of autonomous use of external tools and lack of teamwork in existing methods for large-scale language model (LLM)-based visual question answering (VQA). Inspired by the human tendency to answer familiar questions directly and use tools such as search engines for unfamiliar questions, we design three LLM-based agents with different capabilities and decide whether to use external tools based on each agent's capabilities. The final answer is derived by voting on the answers of each agent. Experimental results on the OK-VQA and A-OKVQA datasets show that our proposed framework improves performance by 2.2 and 1.0, respectively, compared to existing methods.

Takeaways, Limitations

Takeaways:
We emphasize the importance of utilizing external tools and collaboration in LLM-based VQA and propose a multi-agent voting framework to effectively implement this.
The excellent performance of the proposed framework was verified through experimental results.
We propose a novel approach to improve the performance of LLM by mimicking human problem-solving methods.
Limitations:
Further research is needed to explore the generalizability of the agent design and tool allocation strategies of the proposed framework.
Further experiments are needed on different types of VQA datasets and external tools.
More sophisticated research is needed into the interaction and communication mechanisms between agents.
👍