Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

SFR-DeepResearch: Towards Effective Reinforcement Learning for Autonomously Reasoning Single Agents

Created by
  • Haebom

Author

Xuan-Phi Nguyen, Shrey Pandit, Revanth Gangi Reddy, Austin Xu, Silvio Savarese, Caiming Xiong, Shafiq Joty

Outline

This paper focuses on the development of a single-agent-based autonomous Deep Research (DR) model. Unlike existing multi-agent systems, this paper presents an autonomous model in which a single agent dynamically determines its next action based on the situation, minimizing web crawling and Python tool integration. Instead of using existing pre-trained or directive-tuned LLMs, we propose a method to enhance agent capabilities through continuous reinforcement learning (RL) on an inference-optimized model. By applying a simple RL recipe using entirely synthetic data to various open-source LLMs, the best-performing model, the SFR-DR-20B, achieved a performance gain of up to 28.7% on the Humanity's Last Exam benchmark. We also present in-depth experimental analysis of the proposed methodology.

Takeaways, Limitations

Takeaways:
A novel approach to developing autonomous deep learning models based on single agents is presented.
An effective method for improving agent capabilities while maintaining reasoning ability through continuous reinforcement learning is presented.
We demonstrate the applicability of a simple RL recipe using only synthetic data to various open-source LLMs.
Significant performance gains achieved on Humanity's Last Exam benchmark.
Limitations:
Since it was trained using only synthetic data, generalization performance on real data needs to be verified.
Performance evaluation on benchmarks other than Humanity's Last Exam is needed.
Although the integration of web crawling and Python tools has been minimized, further research is needed to determine its efficiency and scalability in real-world deep research environments.
Lack of interpretability of the complex inference process of single-agent models.
👍