Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Grouped Sequency-arranged Rotation: Optimizing Rotation Transformation for Quantization for Free

Created by
  • Haebom

Author

Euntae Choi, Sumin Song, Woosang Lim, Sungjoo Yoo

Outline

This paper proposes a novel rotation matrix generation method based on post-trained quantization (PTQ) to address the deployment challenges of large-scale language models (LLMs), which require expensive computational resources. To address the performance degradation of existing rotation-based methods at very low bit widths, such as 2 bits, we present a novel approach that reduces quantization errors by clustering similar frequency components using the Walsh-Hadamard transform and sequence alignment. Specifically, we demonstrate the Grouped Sequence Alignment Rotation (GSR) technique, which utilizes a block diagonal matrix with small Walsh blocks, effectively isolating the influence of outliers and achieving performance comparable to learning-based optimization methods. We validate the performance of the proposed method through inference tasks and perplexity (PPL) score evaluations on the WikiText-2 dataset, demonstrating its performance improvement over existing learned rotation techniques.

Takeaways, Limitations

Takeaways:
This can significantly reduce the cost of LLM deployment by enabling effective post-training quantization even at very low bit widths, such as 2-bit.
It is a training-free method that does not require learning, so it can reduce computational costs compared to existing optimization-based methods.
It can be applied additionally to existing learned rotation techniques to improve performance.
Our new rotation matrix generation method using Walsh-Hadamard transform and sequence alignment has high applicability to other quantization problems.
Limitations:
The performance of the proposed method is based on experimental results on a specific dataset (WikiText-2), and generalization performance to other datasets or tasks requires further research.
Due to the nature of the Walsh-Hadamard transform, its computational complexity may increase when applied to high-dimensional models.
Additional applicability and performance analysis for various LLM architectures is needed.
👍