Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

SASSHA: Sharpness-aware Adaptive Second-order Optimization with Stable Hessian Approximation

Created by
  • Haebom

Author

Dahun Shin, Dongyeop Lee, Jinseok Chung, Namhoon Lee

Outline

This paper addresses the phenomenon that approximate second-order optimization methods have worse generalization performance than first-order methods. We analyze that existing second-order methods tend to converge to sharper minima than SGD from the perspective of loss landscape. Accordingly, we propose a novel second-order optimization method, Sassha, which explicitly reduces the sharpness of minima to improve generalization performance. Sassha stabilizes the computation of approximate Hessian matrix during the optimization process and designs a sharpness minimization technique by considering delayed Hessian update for efficiency. Through various deep learning experiments, we verify that Sassha shows superior generalization performance compared to other methods, and provide a comprehensive analysis including convergence, robustness, stability, efficiency, and cost.

Takeaways, Limitations

Takeaways:
A new solution to the generalization of the approximate second-order optimization method for performance degradation problems (sharpness minimization)
Sassha shows superior generalization performance compared to existing methods.
Design considering delayed Hessian updates for efficiency
Provides comprehensive analysis in terms of convergence, robustness, stability, efficiency, and cost.
Limitations:
Further verification of the generalizability of the experimental results presented in the paper is needed.
Sassha's performance may be biased towards certain problem types or network structures.
A more detailed analysis of Sassha's computational complexity and memory requirements is needed.
👍