Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

Robust Training of Neural Networks at Arbitrary Precision and Sparsity

Created by
  • Haebom

Author

Chengxi Ye, Grace Chu, Yanfeng Liu, Yichi Zhang, Lukasz Lew, Li Zhang, Mark Sandler, Andrew Howard

Outline

This paper presents a novel method to address the difficulties of backpropagation due to the discontinuous operations of quantization and sparsity, particularly in ultra-low precision and sparse regions. Conventional straight-through estimators (STEs) suffer from the potential for learning to be compromised by the mismatch between forward propagation, which takes quantization into account, and backpropagation, which ignores quantization. This paper addresses this issue by introducing a denoising dequantization transform derived from a principled ridge regression objective function. This transform generates explicit modified gradient paths, making the STE robust throughout the learning process by recognizing and robustly addressing quantization errors, which the alternative gradients ignore. Furthermore, we extend this principle to sparsity by viewing sparsity as a special form of quantization that maps non-negligible values to zero. This unified framework enables stable training of existing models across a range of precisions and sparsities, and achieves robust training of fully binary (A1W1) and sparse sub-1-bit networks, where other methods fail. This provides state-of-the-art results and provides a path toward theoretically grounded, ultra-efficient neural networks.

Takeaways, Limitations

Takeaways:
A new method is presented to effectively solve the backpropagation problem that occurs during quantization and sparsification.
Demonstrating the robust learning potential of ultra-low-precision and sparse networks.
Achieving state-of-the-art performance on fully binary (A1W1) and sparse sub-1-bit networks.
Presenting new possibilities for theoretically supported, ultra-efficient neural network design.
Limitations:
Lack of analysis of the computational cost and memory usage of the proposed method.
Further experiments are needed to evaluate generalization performance across different network structures and datasets.
Further research is needed on the optimal parameter settings of the proposed ridge regression-based inverse quantization transform.
👍