Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Exploring Autonomous Agents: A Closer Look at Why They Fail When Completing Tasks

Created by
  • Haebom

Author

Ruofan Lu, Yichen Li, Yintong Huo

Outline

This paper presents a benchmark consisting of 34 programmable tasks to evaluate the ability of large-scale language model (LLM)-based autonomous agent systems to automate complex tasks. We evaluated three open-source agent frameworks combined with two LLM backbones, achieving approximately 50% task completion rates. Through in-depth failure analysis, we develop a three-tiered failure classification system, consisting of planning errors, task execution issues, and incorrect response generation, tailored to the task stage. We then propose actionable improvements to enhance the agent's planning and self-diagnosis capabilities. This failure classification system and mitigation strategies provide an empirical foundation for the development of more robust and effective autonomous agent systems.

Takeaways, Limitations

Takeaways:
We provide an in-depth analysis of the performance and failure causes of LLM-based autonomous agent systems.
We present actionable improvements to enhance the agent's planning and self-diagnosis capabilities.
It provides an empirical foundation for the development of future autonomous agent systems.
We present a new benchmark for evaluating autonomous agent systems.
Limitations:
The types of agent frameworks and LLM backbones used in the evaluation are limited.
The types and difficulty of tasks included in the benchmark may not be diverse.
The scope of failure cause analysis may be limited.
Further verification of the effectiveness of the proposed improvement measures is needed.
👍