Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Convergence and Generalization of Anti-Regularization for Parametric Models

Created by
  • Haebom

Author

Dongseok Kim, Wonjun Jeong, Gisung Oh

Outline

This paper proposes "anti-regularization," a novel technique that intentionally boosts the expressive power of models in small-data environments. Anti-regularization introduces a reversed reward term into the loss function, enhancing the expressive power of models at small sample sizes and fading out interventions as the sample size increases, following a power-law decay schedule. We formulate spectral safety conditions and trust region constraints and design a lightweight safety mechanism combining a projection operator and gradient clipping to ensure stable interventions. The theoretical analysis extends to linear smoothing and neural tangent kernel regimes, providing practical guidance on the selection of a decay exponent through an empirical trade-off between risk and variance. Experimental results demonstrate that anti-regularization mitigates underfitting in both regression and classification while maintaining generalization performance and improving calibration. Further analysis confirms that the decay schedule and safety mechanism are essential for avoiding overfitting and instability. Furthermore, we propose a degree-of-freedom target schedule that maintains constant per-sample complexity. Denormalization is a simple and reproducible procedure that integrates seamlessly into standard empirical risk minimization pipelines, enabling robust learning under limited data and resource constraints by intervening only when necessary and discarding otherwise.

Takeaways, Limitations

Takeaways:
We present a novel method to effectively alleviate the problem of model underfitting on small datasets.
We confirmed the effectiveness of improved generalization performance and calibration in both regression and classification problems.
Simple and reproducible procedures that can be easily integrated into existing learning pipelines.
An alternative schedule is proposed that maintains constant complexity per sample.
Limitations:
The theoretical analysis of the proposed method is limited to linear smoothing and neural tangent kernel regimes. Analysis of a wider range of models is needed.
There is a lack of clear guidance on setting optimal values for the damping index and degrees of freedom target schedule. Experimental exploration may be necessary.
The experimental results may be limited to a specific dataset. Additional experiments on diverse datasets are needed.
👍