Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

CodeAgents: A Token-Efficient Framework for Codified Multi-Agent Reasoning in LLMs

Created by
  • Haebom

Author

Bruce Yang, Xinfeng He, Huan Gao, Yifan Cao, Xiaofan Li, David Hsu

Outline

This paper emphasizes the importance of effective prompt design for improving the planning ability of large-scale language model (LLM)-based agents, and points out the Limitations (single-agent, plan-centric, task-accuracy-based evaluation) shortcomings of existing structured prompting strategies. To address these shortcomings, we present CodeAgents, a prompting framework that enables structured and token-efficient planning in multi-agent environments. CodeAgents encodes all components of agent interactions, including tasks, plans, feedback, system roles, and external tool calls, into modular pseudocode rich in control structures (loops, conditionals), Boolean logic, and formal variables. This transforms loosely coupled agent plans into cohesive, interpretable, and verifiable multi-agent inference programs. When evaluated on three benchmarks, GAIA, HotpotQA, and VirtualHome, using various LLMs, our approach demonstrates 3-36% better planning performance than natural language prompting baselines, and achieves a state-of-the-art success rate of 56% on VirtualHome. Additionally, it reduces input and output token usage by 55-87 % and 41-70%, respectively, highlighting the importance of token-aware evaluation metrics in developing scalable multi-agent LLM systems. Code and materials can be found in https://anonymous.4open.science/r/CodifyingAgent-5A86 .

Takeaways, Limitations

Takeaways:
Presenting the CodeAgents framework that is effective in improving LLM-based agent planning capabilities in multi-agent environments
Generating structured and interpretable multi-agent inference programs through modular pseudocode.
Increased scalability and improved performance through improved token efficiency (3-36% improvement, 56% success rate achieved in VirtualHome)
Emphasize the importance of token recognition evaluation metrics
Limitations:
Further research is needed to determine the generalizability of the presented framework and its applicability to various environments.
Need to analyze performance differences according to the type and size of LLM used
As the complexity of pseudocode increases, difficulties in interpretation and management may arise.
Additional information is needed on the specific content and usability of the currently published code and materials.
👍