Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Beacon: Post-Training Quantization with Integrated Grid Selection

Created by
  • Haebom

Author

Shihao Zhang, Rayan Saab

Outline

This paper discusses quantization, a widely used compression technique for reducing the memory and computational costs of pre-trained large-scale models. In particular, selecting an appropriate scaling factor to replace weight values with values on a scaled integer grid is a key challenge in channel-wise post-training quantization (PTQ). Existing methods typically fix the scale in advance through heuristic tuning or grid search. In this paper, we propose Beacon, a simple and effective algorithm that eliminates the need for manual tuning. Beacon performs channel-wise PTQ directly using an unscaled grid and automatically determines the optimal scaling factor by leveraging the geometric properties of scalar quantization. It does not rely on backpropagation or large calibration sets. Despite its simplicity and tuning-free nature, Beacon achieves competitive performance compared to state-of-the-art methods, making it a practical solution for efficient model deployment.

Takeaways, Limitations

Takeaways:
We present Beacon, a simple and effective algorithm that automatically determines the optimal scaling factor without manual tuning in per-channel post-training quantization (PTQ).
Achieve competitive performance with state-of-the-art methods without backpropagation or large calibration sets.
Provides practical solutions for efficient model deployment.
Limitations:
Additional experiments and analysis may be needed to determine the generalization performance of the Beacon algorithm.
Further performance evaluations for different model architectures and quantization bit counts are needed.
It may perform worse than other state-of-the-art methods for certain types of models or tasks.
👍