Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

WebWalker: Benchmarking LLMs in Web Traversal

Created by
  • Haebom

Author

Jialong Wu, Wenbiao Yin, Yong Jiang, Zhenglin Wang, Zekun Xi, Runnan Fang, Linhai Zhang, Yulan He, Deyu Zhou, Pengjun Xie, Fei Huang

Outline

Retrieval-augmented generation (RAG) excels at open-ended question-answering tasks, but existing search engines only retrieve superficial information, limiting LLM's ability to process complex, multi-layered information. In this paper, we present WebWalkerQA, a benchmark for evaluating LLM's web exploration capabilities. WebWalkerQA evaluates LLM's ability to systematically extract high-quality data by exploring subpages of a website. Furthermore, we propose WebWalker, a multi-agent framework that mimics human-like web exploration using the explore-critique paradigm. Experimental results demonstrate that WebWalkerQA is a challenging task, and we demonstrate the effectiveness of RAG combined with WebWalker through horizontal and vertical integration in real-world scenarios.

Takeaways, Limitations

Takeaways:
WebWalkerQA provides a new benchmark to assess LLMs' web navigation skills.
We demonstrate that WebWalker is an effective multi-agent framework that improves the performance of RAG.
Demonstrates the effectiveness of horizontal and vertical integration of RAG and WebWalker in real-world scenarios.
Limitations:
Further analysis is needed on the difficulty of WebWalkerQA and the differences between it and the actual web environment.
Further research is needed on WebWalker's scalability and adaptability to various website architectures.
Further research is needed to improve the performance of the proposed WebWalker.
👍