Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

The Importance of Being Lazy: Scaling Limits of Continual Learning

Created by
  • Haebom

Author

Jacopo Graldi, Alessandro Breccia, Giulia Lanzillotta, Thomas Hofmann, Lorenzo Noci

Outline

This paper addresses the lack of understanding of neural networks' learning difficulties and catastrophic forgetting (CF) in non-stationary environments. We systematically investigate the effects of model size and the extent of feature learning on continuous learning. We reconcile conflicting findings from previous research by distinguishing between lazy and rich learning approaches through parameterization of the architecture. We demonstrate that increasing model width is beneficial only when it reduces the amount of feature learning, thereby increasing lazy learning. Using the framework of dynamical mean field theory, we study the infinite-width dynamics of models in the feature learning space and characterize CF by extending previous theoretical findings limited to the lazy learning space. We investigate the complex relationships among feature learning, task non-stationarity, and forgetting, finding that high feature learning is beneficial only for similar tasks. We demonstrate a transfer mediated by task similarity, where models effectively exit the lazy learning space with low forgetting and enter the rich learning space with significant forgetting. Finally, we demonstrate that neural networks achieve optimal performance at an optimal level of feature learning, which varies with task non-stationarity, and that this transfer holds true across model sizes. This study provides an integrated perspective on the role of scale and feature learning in persistent learning.

Takeaways, Limitations

Takeaways:
Provides an integrated understanding of the interplay between model scale and feature learning.
We address the contradictions in existing research by distinguishing between delayed and rich learning approaches.
We reveal that the optimal level of feature learning depends on task non-stationarity and model scale.
We analyze the dynamics of continuous learning in infinite-width neural networks using dynamic mean-field theory.
Limitations:
This study is based on theoretical analysis and may lack experimental verification for practical applications.
This may be a result specific to a specific architecture and type of task.
Further research may be needed on quantitative measures of task similarity.
👍