Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

ToolACE: Winning the Points of LLM Function Calling

Created by
  • Haebom

Author

Weiwen Liu, Xu Huang, Xingshan Zeng, Liu, Enhong Chen

Outline

In this paper, we present ToolACE, a high-quality and diverse training data generation pipeline for improving the function-calling capability of large-scale language models. To overcome the limitations of existing synthetic data generation methods, ToolACE constructs an API pool containing 26,507 diverse APIs through a self-evolving synthesis process, and generates conversational data through multiple agent interactions and formalized thought processes. Rule-based and model-based validation systems ensure data accuracy, and we show that an 8 billion-parameter model trained with the generated data achieves state-of-the-art performance on the Berkeley Function-Calling Leaderboard. The model and some of the data are publicly available.

Takeaways, Limitations

Takeaways:
We present ToolACE, a novel data generation pipeline that contributes to improving the function call performance of large-scale language models.
Solve the problem of insufficient coverage and accuracy of existing synthetic data __T117226_____.
Generating diverse and complex data through self-evolving synthetic processes and multi-agent interactions.
Ensuring high data accuracy through rule-based and model-based verification systems.
Achieving state-of-the-art performance even on models with relatively small parameters (8 billion).
Promoting research sharing and advancement through disclosure of generated data and models.
Limitations:
Further validation of the generalization performance of data generated by ToolACE is needed.
Lack of clear discussion of the differences and limitations from real data.
Potential loss of generality due to dependency on specific API pools.
Applicability review for a wider range of function call scenarios is needed.
👍