This paper proposes a novel attention mechanism to address the numerical instability and performance degradation of conventional softmax attention at long inference token lengths. We decompose the softmax operation into a nonlinear positive transformation and $l_1$-regularization steps, demonstrating that $l_1$-regularization is essential for maintaining model performance. In the first step, we introduce a numerically stable softplus activation function instead of an exponential function and a dynamic scaling factor based on invariant entropy, thereby outperforming conventional softmax attention. In the second step, we introduce a reweighting mechanism that sharpens the attention distribution, amplifying important weights and diminishing weak weights to more effectively focus attention on relevant tokens. Combining these two approaches ensures numerical stability and achieves excellent results on long context extraction tasks and standard downstream benchmarks, while maintaining a nearly constant validation loss even at 16x the training length and dramatically improving length extrapolation performance.