[공지사항]을 빙자한 안부와 근황 
Show more

Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Butterfly Effects in Toolchains: A Comprehensive Analysis of Failed Parameter Filling in LLM Tool-Agent Systems

Created by
  • Haebom

Author

Qian Xiong, Yuekai Huang, Ziyou Jiang, Zhiyuan Chang, Yujia Zheng, Tianhao Li, Mingyang Li

Outline

This paper addresses the parameter failure problem, which limits the effectiveness of the tool-agent paradigm in extending the capabilities of large-scale language models (LLMs). First, we construct a classification scheme containing five parameter failure categories derived from the call chains of the main tool-agents. By applying 15 input perturbation methods, we explore the correlations between three different input sources and the failure categories. The experimental results show that parameter name hallucination failures mainly come from the inherent limitations of LLMs, while input source problems mainly cause other failure patterns. In order to improve the reliability and effectiveness of tool-agent interactions, we propose several improvement suggestions, including standardizing tool return formats, improving error feedback mechanisms, and ensuring parameter consistency.

Takeaways, Limitations

Takeaways:
We present a systematic classification scheme for parameter failures of tool agents.
Experimentally elucidating the correlation between different input sources and failure categories.
Suggest specific improvement measures to improve the reliability and effectiveness of tool-agent interactions.
Limitations:
Further research is needed on the generality and scope of applicability of the proposed classification system and improvement measures.
Considering the limitations of the input perturbation method and dataset used in the experiment.
Performance evaluation and verification in actual application environments are required.
👍