Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

NoWag: A Unified Framework for Shape Preserving Compression of Large Language Models

Created by
  • Haebom

Author

Lawrence Liu, Inesh Chakrabarti, Yixiao Li, Mengdi Wang, Tuo Zhao, Lin F. Yang

Outline

To address the challenges of deploying large-scale language models (LLMs) in resource-constrained environments, this paper proposes NoWag (Normalized Weight and Activation Guided Compression), a unified framework for zero-shot shape-preserving compression algorithms. NoWag compresses Llama-2 7B/13B/70B and Llama-3 8B/70B models using two forms of shape-preserving compression: vector quantization (NoWag-VQ) and unstructured/semi-structured pruning (NoWag-P). Experimental results show that NoWag-VQ significantly outperforms state-of-the-art zero-shot vector quantization methods, and NoWag-P is competitive with them. These results suggest commonalities between the two compression paradigms for future research. The source code is available on GitHub.

Takeaways, Limitations

Takeaways:
We propose NoWag, an effective unified framework for zero-shot shape-preserving compression algorithms.
NoWag-VQ outperforms existing state-of-the-art zero-shot vector quantization methods.
NoWag-P demonstrates competitive performance with existing state-of-the-art pruning methods.
To suggest future research directions by presenting commonalities between different compression paradigms, such as vector quantization and pruning.
Limitations:
The experimental results presented in this paper are for specific LLM models (Llama-2, Llama-3), and further research is needed to determine their generalizability to other models.
Lack of in-depth analysis of factors contributing to NoWag's performance improvement.
Further evaluation of the applicability and performance of NoWag in various resource-constrained environments is needed.
👍