This page organizes papers related to artificial intelligence published around the world. This page is summarized using Google Gemini and is operated on a non-profit basis. The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.
Structured Agent Distillation for Large Language Model
Created by
Haebom
Author
Jun Liu, Zhenglun Kong, Peiyan Dong, Changdi Yang, Tianqi Li, Hao Tang, Geng Yuan, Wei Niu, Wenbin Zhang, Pu Zhao, Xue Lin, Dong Huang, Yanzhi Wang
Structured Agent Distillation
Outline
Large-Scale Language Model (LLM)-based agents demonstrate powerful decision-making capabilities by combining reasoning and action. However, high inference costs and large model sizes limit their practical deployment. In this paper, we propose Structured Agent Distillation, a framework that compresses large LLM-based agents into smaller student models while maintaining inference fidelity and behavioral consistency. Unlike standard token-level distillation, our method splits trajectories into {[REASON]} and {[ACT]} segments and applies segment-wise losses to align each component with the teacher's actions. This structure-aware supervision allows the smaller agent to better replicate the teacher's decision-making process. In experiments with ALFWorld, HotPotQA-ReAct, and WebShop, our study consistently outperforms token-level and imitation learning baselines, achieving significant compression with minimal performance degradation. Scaling and pruning results highlight the importance of segment-level alignment for efficient and deployable agents.
Takeaways, Limitations
•
We present an effective method for reducing model size while maintaining the inference ability and behavioral consistency of large-scale LLM-based agents.
•
Achieves high performance in a ReAct-style framework, outperforming token-level and imitation learning approaches.
•
The generality of the methodology was verified through experiments in various environments such as ALFWorld, HotPotQA-ReAct, and WebShop.
•
Emphasizes the importance of span-level alignment and contributes to the development of efficient agents.
•
Lack of in-depth discussion of specific model architecture or implementation details.
•
Specific optimizations may be required for specific work environments.
•
Lack of analysis for additional issues that may arise during actual deployment (e.g. latency).