Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

AgentGym-RL: Training LLM Agents for Long-Horizon Decision Making through Multi-Turn Reinforcement Learning

Created by
  • Haebom

Author

Zhiheng Xi, Jixuan Huang, Chenyang Liao, Baodai Huang, Honglin Guo, Jiaqi Liu, Rui Zheng, Junjie Ye, Jiazheng Zhang, Wenxiang Chen, Wei He, Yiwen Ding, Guanyu Li, Zehui Chen, Zhengyin Du, Xuesong Yao, Yufei Xu, Jiecao Chen, Tao Gui, Zuxuan Wu, Qi Zhang, Xuanjing Huang, Yu-Gang Jiang

Outline

AgentGym-RL is a novel reinforcement learning framework for training autonomous LLM agents from scratch, capable of making a series of intelligent decisions to solve complex real-world problems in diverse environments without supervised learning fine-tuning. It features a modular and decoupled architecture, encompasses a variety of real-world scenarios, and supports leading reinforcement learning algorithms. We propose a ScalingInter-RL training method designed to balance exploration and exploitation and achieve robust reinforcement learning optimization. Initially, we focus on exploitation by limiting the number of interactions, gradually shifting to exploration over a wider horizon to encourage diverse problem-solving strategies. We present experimental results demonstrating that agents train to perform on par with or better than commercial models on 27 tasks across diverse environments. We plan to open-source the entire AgentGym-RL framework, including code and datasets.

Takeaways, Limitations

Takeaways:
We present a framework for training autonomous LLM agents capable of solving complex real-world problems in diverse environments without fine-tuning supervised learning.
Modular architecture provides flexibility and scalability.
A ScalingInter-RL training method that considers the exploration-exploitation balance is proposed to promote stable reinforcement learning optimization and diverse problem-solving strategies.
Validated performance equivalent to or superior to commercial models across 27 diverse tasks
Contributing to the development of the research community through the open-source release of the AgentGym-RL framework.
Limitations:
This paper presents only initial results, and further research is needed on long-term stability and scalability.
Although it supports various environments, generalization performance to all real-world environments requires further verification.
Further research is needed on the optimal parameter settings and generalizability of ScalingInter-RL.
👍