Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Scaling Laws for Task-Stratified Knowledge in Post-Training Quantized Large Language Models

Created by
  • Haebom

Author

Chenxi Zhou, Pengfei Cao, Jiang Li, Jun Zhao, Kang Liu

Outline

This paper explores post-training quantization (PTQ), a practical compression method for addressing the size issues that arise during the deployment of large-scale language models (LLMs). We note that previous studies have failed to provide a comprehensive understanding of the impact of PTQ and the scaling laws of quantized models. We experimentally explored hierarchical scaling laws across various tasks. We decompose knowledge in LLMs into memorization and exploitation skills and develop an integrated quantitative framework encompassing model size, effective bit width, calibration set size, and group size. Our results reveal that knowledge memorization is significantly more sensitive to changes in effective bit width, calibration set size, and model size than knowledge exploitation. These findings provide a granular understanding of the impact of PTQ and offer guidance for developing knowledge-aware quantization strategies that better preserve targeted cognitive functions.

Takeaways, Limitations

Takeaways:
Provides a detailed analysis of the impact of PTQ on LLM students' knowledge memorization and application skills.
By revealing different sensitivities to knowledge memorization and application, we provide insights needed to improve PTQ strategies.
We present an integrated quantitative framework that takes into account model size, effective bit width, calibration set size, and group size.
Provides guidance for developing knowledge-aware quantization strategies that preserve targeted cognitive functions.
Limitations:
This study may be limited to a specific LLM architecture and dataset. Further research is needed on a variety of LLM architectures and datasets.
Further validation of the generalizability of the proposed framework is needed.
There is a lack of PTQ performance evaluation in real-world deployment environments.
👍