Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

Model-Preserving Adaptive Rounding

Created by
  • Haebom

Author

Albert Tseng, Zhaofeng Sun, Christopher De Sa

Outline

This paper introduces Yet Another Quantization Algorithm (YAQA), an adaptive rounding algorithm that directly considers the final output error of a quantized model. Unlike existing quantization algorithms that minimize activation errors at each layer, YAQA directly considers the network's final output error to reduce end-to-end error. Our results demonstrate that YAQA achieves approximately 30% lower error than existing methods like GPTQ/LDLQ and outperforms quantization-aware training.

Takeaways, Limitations

Takeaways:
YAQA presents the first end-to-end error bounds for quantization algorithms.
We characterize the convergence time of an adaptive rounding algorithm via Hessian approximation.
Enables efficient Hessian sketching via Kronecker-factored approximation.
It outperforms GPTQ/LDLQ and quantization-aware training.
There is no inference overhead.
Limitations:
The paper may lack specific implementation details and computational complexity.
Generalizability to other types of models or tasks requires further research.
👍