Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Beyond Ten Turns: Unlocking Long-Horizon Agentic Search with Large-Scale Asynchronous RL

Created by
  • Haebom

Author

Jiaxuan Gao, Wei Fu, Minyang Xie, Shusheng Xu, Chuyi He, Zhiyu Mei, Banghua Zhu, Yi Wu

Outline

This paper introduces ASearcher, an open-source project for enhancing the search capabilities of large-scale language model (LLM)-based agents. While existing LLM-based agents excel at handling complex, knowledge-intensive tasks, they fall short in expert-level search intelligence (e.g., resolving ambiguous questions, generating accurate retrievals, analyzing results, and thoroughly exploring). To overcome these limitations, ASearcher presents a scalable and efficient asynchronous reinforcement learning-based training framework. Specifically, ASearcher outperforms existing open-source agents on the xBench and GAIA benchmarks through scalable asynchronous reinforcement learning (RL) training that enables long-horizon search and a prompt-based LLM agent that automatically generates a high-quality question-answering (QA) dataset. It also demonstrates extreme long-term search capabilities, with tool calls exceeding 40 turns and outputs exceeding 150,000 tokens. The model, training data, and code are publicly available.

Takeaways, Limitations

Takeaways:
A scalable and efficient asynchronous reinforcement learning-based LLM agent training framework is presented.
Automatically Generating High-Quality QA Datasets Using Prompt-Based LLM Agents
Performance improvements in xBench and GAIA benchmarks compared to existing open-source agents (based on Avg@4)
Implementing extreme long-term search capabilities (tool calls of 40 turns or more, output of 150k tokens or more)
Contributes to research and development by being released as open source
Limitations:
Further research is needed to determine the generalizability of the methodology presented in this paper.
Need for additional performance evaluations across various domains and tasks
Consideration needs to be given to the safety and ethical issues of agents.
👍