Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Kourkoutas-Beta: A Sunspike-Driven Adam Optimizer with Desert Flair

Created by
  • Haebom

Author

Stavros C. Kassinos

Outline

This paper focuses on the use of transformer neural networks in data-driven partial differential equation (PDE) substitution models, where training samples from fluctuating boundaries and initial conditions lead to irregular losses and steep gradients, and in physically informatic neural networks (PINNs), where stiff compound losses amplify these effects. To address this, we propose Kourkoutas-Beta, an Adam-style optimizer that replaces the fixed second-moment discount rate β₂ with a layer-by-layer dynamic value determined by a bounded "sunspike" ratio, the ratio of the current pooled gradient norm to the exponential moving average (EMA) of past norms. Spikes push β₂ down toward β₂_min, while stable phases maintain it near β₂_max. Options include Leaky-AMSGrad (attenuation), trust region clipping (max_ratio), adaptive fine-tuning, and several bias correction modes ("none," "beta2max," and "exact"). We test Kourkoutas-Beta on four different setups: Heat2D (a surrogate model for the Transformer PDE), Heat3D (a 3D thermal conduction PINN), a lightweight MLX synthesis task with shaking and rare trigger bursts, and a character-level transformer using the 30MB enwik8 dataset, and show that it improves stability and final loss compared to fixed β₂ Adam. In particular, on small-enwik8, it shows a bits-per-character reduction of about 38% compared to Adam-0.95 and about 58% compared to Adam-0.999. Kourkoutas-Beta is a drop-in method that improves robustness under steep gradients while maintaining Adam-style convergence guarantees.

Takeaways, Limitations

Takeaways:
A novel optimization technique is presented that is effective in solving transformer-based physics problems suffering from steep slope problems.
Improved stability and performance of the Adam optimizer.
Performance improvements were observed on various problems (PDE surrogate, PINN, synthesis tasks, language models).
It can be used as a drop-in replacement for existing Adam, and the runtime overhead is minimal.
Limitations:
Further research is needed on the generalization performance of the proposed optimization technique.
Further analysis is needed on various hyperparameter tuning.
Applicability verification for more complex and large-scale physical problems is needed.
There is a need to analyze the performance changes when hyperparameter settings optimized for a specific problem are applied to other problems.
👍