Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

SGD Convergence under Stepsize Shrinkage in Low-Precision Training

Created by
  • Haebom

Author

Vincent-Daniel Yun

Outline

This paper analyzes the impact of gradient quantization-induced size reduction on the convergence of stochastic gradient descent (SGD) in low-precision learning, which has become important for reducing computational and memory costs in large-scale deep learning. We study the convergence of SGD under a gradient shrinkage model, where each stochastic gradient is shrinked by a factor q_k \in (0,1] . We show that this shrinkage affects the effective step size \mu_k q_k , which is the typical step size, and slows down the convergence when q_{\min} < 1. Under the usual smoothness and bounded variance assumptions, we demonstrate that low-precision SGD still converges, but at a slower speed determined by q_{\min} and with a higher steady-state error level due to quantization effects. We theoretically analyze how low numerical precision slows down the learning rate through gradient shrinkage by treating it as gradient shrinkage within the standard SGD convergence setting.

Takeaways, Limitations

Takeaways: By theoretically explaining the cause of the slow convergence speed and increased steady-state error of low-precision SGD through the gradient shrinkage model, we provide a theoretical foundation for improving low-precision learning strategies.
Limitations: There is a lack of consideration for the various quantization techniques and specific hardware environments of real-world deep learning models. Experimental verification is needed to determine how well the theoretical analysis results reflect actual performance. Further analysis is needed to determine how well the assumptions about the distribution of q_k reflect real-world situations.
👍