Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

EvoP: Robust LLM Inference via Evolutionary Pruning

Created by
  • Haebom

Author

Shangyu Wu, Hongchao Du, Ying Xiong, Shuai Chen, Tei-Wei Kuo, Nan Guan, Chun Jason Xue

Outline

This paper proposes EvoP, an evolutionary pruning framework, to address the problem of deploying large-scale language models (LLMs) in resource-constrained environments. To address the performance degradation and data feature neglect of existing heuristic-based pruning methods, EvoP introduces a cluster-based calibration dataset sampling (CCDS) strategy to generate diverse calibration datasets and an evolutionary pruning pattern search (EPPS) method to identify optimal pruning patterns. Experiments on various LLMs and subtasks demonstrate the effectiveness of EvoP, demonstrating its practical and scalable solution for deploying LLMs in real-world applications.

Takeaways, Limitations

Takeaways:
We present a novel evolutionary pruning framework (EvoP) that overcomes the limitations of existing heuristic-based LLM pruning methods.
Improve pruning performance by generating more diverse correction datasets through the CCDS strategy.
Effectively finding the optimal pruning pattern and minimizing performance degradation through the EPPS method.
Demonstrated excellent performance and efficiency in various LLM and sub-tasks, suggesting practical applicability.
Limitations:
EvoP's performance improvements may be limited to specific LLMs and subtasks.
The computational cost of EPPS may be higher than that of conventional methods.
Further research is needed to determine the optimal number of clusters for the CCDS strategy.
👍