Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Fisher-Orthogonal Projection Methods for Natural Gradient Descent with Large Batches

Created by
  • Haebom

Author

Yishun Lu, Wesley Armor

Outline

This paper proposes the Fisher-Orthogonal Projection (FOP) technique to address optimization problems in large-scale mini-batch learning. The large, high-bandwidth memory of modern GPUs has made mini-batch learning possible, encompassing tens of thousands of samples. However, existing optimization techniques struggle to effectively scale to such large batch sizes. Specifically, linear optimization methods are prone to suboptimal local minima due to gradient noise, which diminishes with increasing batch size. Furthermore, KFAC, a quadratic optimization method, requires excessive damping to maintain stability, resulting in poor performance. FOP utilizes the gradients of two sub-batches to generate variance-aware update directions. This approach restores the effectiveness of quadratic optimization methods even on large batch sizes, enabling improved generalization and faster convergence.

Takeaways, Limitations

Takeaways:
A novel method to improve deep learning training efficiency in large batch sizes is presented.
Solving the performance degradation problem of secondary optimization methods through FOP
Suggesting the possibility of improving generalization performance and achieving fast convergence speed.
Limitations:
Further experiments are needed to determine whether the effects of FOP are consistent across all types of models and datasets.
A detailed analysis of FOP's computational cost and memory requirements is needed.
Research is needed on tuning various hyperparameters.
👍