This is a page that curates AI-related papers published worldwide. All content here is summarized using Google Gemini and operated on a non-profit basis. Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.
Mosaic: Composite Projection Pruning for Resource-efficient LLMs
Created by
Haebom
Author
Bailey J. Eccles, Leon Wong, Blesson Varghese
Outline
This paper presents projection pruning, a novel fine-tuned pruning method, to address the high computational and memory requirements that limit the deployment of large-scale language models (LLMs). To overcome the limitations of existing coarse-grained pruning methods, we propose composite projection pruning, which combines ragged pruning, which maintains accuracy, and formal pruning, which reduces model size. Based on this, we develop Mosaic, a novel system for generating and deploying pruned LLMs. We evaluate its performance and quality metrics across various hardware platforms, LLMs, and datasets. Mosaic generates models up to 7.19x faster than existing methods, achieving up to 84.2% lower perplexity and 31.4% higher accuracy. Furthermore, Mosaic models exhibit up to 67% faster inference speed and 68% lower GPU memory usage. Mosaic is publicly available at https://github.com/blessonvar/Mosaic .
We present LLM pruning methods (projection pruning and composite projection pruning) that are much faster and more efficient than existing crude pruning methods.
◦
Improved accuracy and performance of the generated model (reduced perplexity, improved accuracy, faster inference, reduced memory usage).
◦
Increasing the practical deployment potential of LLM through the developed system Mosaic.
◦
Improve accessibility by releasing the developed system as open source.
•
Limitations:
◦
The results presented in this paper are evaluation results for a specific hardware platform, LLM, and dataset, and further research is needed to determine generalizability to other environments.
◦
Further research is needed on optimal parameter settings for projection pruning and composite projection pruning.
◦
Additional comparative analysis of applicability and performance for various types of LLMs is needed.