Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

MSARL: Decoupling Reasoning and Tool Use with Multi-Small-Agent Reinforcement Learning

Created by
  • Haebom

Author

Dayu Wang, Jiaye Yang, Weikang Li, Jiahui Liang, Yang Li

Outline

MSARL is a multi-agent reinforcement learning framework in which multiple small agents collaborate through division of labor. While existing tool-integrated inference systems involve a single, large model that mixes long-term inference with precise tool manipulation, resulting in cognitive overload and unstable coordination, MSARL explicitly separates inference and tool usage. The inference agent decomposes the problem and plans tool invocation, while multiple tool agents specialize in specific external tools and are trained through a combination of imitation learning and reinforcement learning with role-specific rewards. In mathematical problem solving, including code execution, MSARL significantly improves inference stability and final answer accuracy compared to single-agent baseline models. Furthermore, this architecture generalizes to various tool-using tasks, demonstrating that the separation of cognitive roles using small agents is a scalable blueprint for designing multi-agent AI.

Takeaways, Limitations

Takeaways:
We demonstrate that a multi-agent system based on small agents can reduce cognitive load interference and improve inference stability and accuracy.
A design that clearly separates reasoning from tool use suggests a scalable architecture that can generalize to a variety of tool use tasks.
Training methods that combine imitation learning and reinforcement learning enable efficient learning of tool agents.
Limitations:
Currently, the focus is on mathematical problem solving and code execution, and further research is needed on generalizability to other types of tasks.
Further research may be needed on efficient cooperation and coordination mechanisms among multiple small agents.
Further validation of scalability and stability for application to complex real-world problems is required.
👍