Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Large Learning Rates Simultaneously Achieve Robustness to Spurious Correlations and Compressibility

Created by
  • Haebom

Author

Melih Barsbey, Lucas Prieto, Stefanos Zafeiriou, Tolga Birdal

Outline

This paper addresses the challenge of simultaneously achieving robustness and resource efficiency, two highly desirable properties in modern machine learning models. We demonstrate that high learning rates help achieve both robustness against spurious correlations and network compactness. We demonstrate that high learning rates yield desirable representational properties, such as invariant feature utilization, class separability, and activation sparsity. Across a variety of spurious correlation datasets, models, and optimizers, we demonstrate that high learning rates consistently achieve these properties compared to other hyperparameters and regularization methods. Furthermore, we present strong evidence that the success of high learning rates on standard classification tasks is related to their ability to address hidden/rare spurious correlations in the training dataset. Our investigation into the underlying mechanisms of this phenomenon highlights the importance of confident error predictions on bias-conflicted samples at high learning rates.

Takeaways, Limitations

Takeaways: We present a novel approach that can complement or even replace existing regularization techniques by demonstrating that high learning rates are effective in simultaneously improving model robustness and resource efficiency. We also demonstrate a correlation between high learning rates and success on standard classification tasks and the resolution of hidden spurious correlations, providing a new perspective on learning rate settings.
Limitations: This study is based on experimental results for a specific dataset and model. Therefore, further verification of generalizability to other datasets and models is necessary. Further in-depth analysis of the mechanisms underlying the effect of high learning rates is needed. While the importance of confident error predictions for bias-conflict samples has been suggested, this needs to be quantified and explained more clearly.
👍