This paper focuses on the use of transformer neural networks in data-driven partial differential equation (PDE) substitution models, where training samples from fluctuating boundaries and initial conditions lead to irregular losses and steep gradients, and in physically informatic neural networks (PINNs), where stiff compound losses amplify these effects. To address this, we propose Kourkoutas-Beta, an Adam-style optimizer that replaces the fixed second-moment discount rate β₂ with a layer-by-layer dynamic value determined by a bounded "sunspike" ratio, the ratio of the current pooled gradient norm to the exponential moving average (EMA) of past norms. Spikes push β₂ down toward β₂_min, while stable phases maintain it near β₂_max. Options include Leaky-AMSGrad (attenuation), trust region clipping (max_ratio), adaptive fine-tuning, and several bias correction modes ("none," "beta2max," and "exact"). We test Kourkoutas-Beta on four different setups: Heat2D (a surrogate model for the Transformer PDE), Heat3D (a 3D thermal conduction PINN), a lightweight MLX synthesis task with shaking and rare trigger bursts, and a character-level transformer using the 30MB enwik8 dataset, and show that it improves stability and final loss compared to fixed β₂ Adam. In particular, on small-enwik8, it shows a bits-per-character reduction of about 38% compared to Adam-0.95 and about 58% compared to Adam-0.999. Kourkoutas-Beta is a drop-in method that improves robustness under steep gradients while maintaining Adam-style convergence guarantees.