Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

NoWag: A Unified Framework for Shape Preserving Compression of Large Language Models

Created by
  • Haebom

Author

Lawrence Liu, Inesh Chakrabarti, Yixiao Li, Mengdi Wang, Tuo Zhao, Lin F. Yang

Outline

To address the challenges of deploying large-scale language models (LLMs) in resource-constrained environments, this paper proposes NoWag (Normalized Weight and Activation Guided Compression), a unified one-shot shape-preserving compression algorithm framework. NoWag compresses Llama-2 (7B, 13B, 70B) and Llama-3 (8B, 70B) models using two shape-preserving techniques: vector quantization (NoWag-VQ) and unstructured/semi-structured pruning (NoWag-P). Experimental results demonstrate that NoWag-VQ significantly outperforms state-of-the-art one-shot vector quantization methods, and NoWag-P is competitive with leading pruning techniques. This highlights the commonalities between the two compression paradigms and suggests promising directions for future research. The source code is available on GitHub.

Takeaways, Limitations

Takeaways:
NoWag improves the efficiency of LLM compression by integrating vector quantization and pruning techniques.
NoWag-VQ outperforms existing state-of-the-art one-shot vector quantization techniques.
NoWag-P has shown competitive performance with existing major pruning techniques.
By revealing commonalities between the two compression paradigms, we suggest future research directions.
Limitations:
The experimental results presented in this paper are limited to specific LLMs (Llama-2, Llama-3) and further studies are needed to determine their generalizability.
NoWag's performance improvements may depend on specific hyperparameter settings and require further experimentation with different settings.
Further research is needed to evaluate the performance of NoWag in real-world deployment environments.
👍