This paper proposes a multi-agent voting framework to address the problems of autonomous use of external tools and lack of teamwork in existing methods for large-scale language model (LLM)-based visual question answering (VQA). Inspired by the human tendency to answer familiar questions directly and use tools such as search engines for unfamiliar questions, we design three LLM-based agents with different capabilities and decide whether to use external tools based on each agent's capabilities. The final answer is derived by voting on the answers of each agent. Experimental results on the OK-VQA and A-OKVQA datasets show that our proposed framework improves performance by 2.2 and 1.0, respectively, compared to existing methods.