This is a page that curates AI-related papers published worldwide. All content here is summarized using Google Gemini and operated on a non-profit basis. Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.
In this paper, we present APIGen-MT, a novel framework for generating high-quality data for effective training of AI agents for multi-pass interactions. APIGen-MT consists of two stages: an agent pipeline that generates accurate task blueprints using an LLM reviewer and an iterative feedback loop, and a complete interaction path generation process through simulated human-agent interactions. A series of xLAM-2-fc-r models (1 billion to 70 billion parameters) trained using this data outperform state-of-the-art models such as GPT-4o and Claude 3.5 on $\tau$-bench and BFCL benchmarks, with smaller models outperforming larger ones, especially in multi-pass settings. In this paper, we contribute to the advancement of AI agent research by open-sourcing 5,000 synthetic data paths and the trained xLAM-2-fc-r models.
Takeaways, Limitations
•
Takeaways:
◦
Presenting an effective framework (APIGen-MT) for generating high-quality multi-pass interaction data.
◦
Development of the xLAM-2-fc-r model series that outperforms existing state-of-the-art models.
◦
Excellent performance of small models in multi-run setups.
◦
Contributing to research advancement through open-sourcing of 5,000 synthetic data and trained models.
•
Limitations:
◦
Lack of clear validation of the differences between simulated data and real-world data.
◦
Because of the high reliance on LLM reviewers, there is a possibility that reviewer bias may affect the results.
◦
Due to limitations in benchmark evaluation, it may not fully reflect performance in real application environments.