This is a page that curates AI-related papers published worldwide. All content here is summarized using Google Gemini and operated on a non-profit basis. Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.
Retrieval-augmented generation (RAG) excels at open-ended question-answering tasks, but existing search engines only retrieve superficial information, limiting LLM's ability to process complex, multi-layered information. In this paper, we present WebWalkerQA, a benchmark for evaluating LLM's web exploration capabilities. WebWalkerQA evaluates LLM's ability to systematically extract high-quality data by exploring subpages of a website. Furthermore, we propose WebWalker, a multi-agent framework that mimics human-like web exploration using the explore-critique paradigm. Experimental results demonstrate that WebWalkerQA is a challenging task, and we demonstrate the effectiveness of RAG combined with WebWalker through horizontal and vertical integration in real-world scenarios.
Takeaways, Limitations
•
Takeaways:
◦
WebWalkerQA provides a new benchmark to assess LLMs' web navigation skills.
◦
We demonstrate that WebWalker is an effective multi-agent framework that improves the performance of RAG.
◦
Demonstrates the effectiveness of horizontal and vertical integration of RAG and WebWalker in real-world scenarios.
•
Limitations:
◦
Further analysis is needed on the difficulty of WebWalkerQA and the differences between it and the actual web environment.
◦
Further research is needed on WebWalker's scalability and adaptability to various website architectures.
◦
Further research is needed to improve the performance of the proposed WebWalker.