Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Beyond Ten Turns: Unlocking Long-Horizon Agentic Search with Large-Scale Asynchronous RL

Created by
  • Haebom

Author

Jiaxuan Gao, Wei Fu, Minyang Xie, Shusheng Xu, Chuyi He, Zhiyu Mei, Banghua Zhu, Yi Wu

Outline

This paper introduces ASearcher, an open-source project for enhancing the search capabilities of large-scale language model (LLM)-based agents. Existing LLM-based agents rely heavily on external tools, particularly search tools, to handle complex tasks. However, they fall short in achieving expert-level search intelligence (e.g., resolving ambiguous questions, generating accurate responses, analyzing results, and performing thorough exploration). To overcome these limitations, ASearcher presents a scalable and efficient asynchronous reinforcement learning (RL)-based training framework. The LLM agent generates its own high-quality question-and-answer (QA) dataset and can perform long-term searches (over 40 turns, with over 15k output tokens). Experimental results demonstrate that it outperforms existing open-source 32B agents on xBench and GAIA benchmarks. The model, training data, and code are publicly available.

Takeaways, Limitations

Takeaways:
A novel approach to improving the search capabilities of large-scale language model-based agents is presented.
Development of a scalable and efficient asynchronous reinforcement learning-based training framework.
Improved performance by generating high-quality QA datasets yourself.
Demonstrating the feasibility of learning complex long-term search strategies.
Achieves superior performance compared to existing open source agents.
Promoting research sharing and development through open source disclosure.
Limitations:
ASearcher's performance improvements may be limited to specific benchmarks (xBench, GAIA).
Generalization performance verification is needed for various real-world search tasks.
Further analysis of the quality and bias of the training data is needed.
Research is needed on the explainability and reliability of agents.
👍