Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

On Understanding the Dynamics of Model Capacity in Continual Learning

Created by
  • Haebom

Author

Supriyo Chakraborty, Krishnan Raghavan

Outline

This paper presents the Effective Model Capacity (CLEMC) for neural networks in Continuous Learning (CL) related to the stability-plasticity dilemma. We develop a differential equation that models the evolution of the interaction between the neural network, task data, and the optimization procedure, and show that the effective capacity, i.e., the stability-plasticity trade-off, is inherently non-stationary. Through extensive experiments across various architectures (including feedforward networks, convolutional neural networks, graph neural networks, and large-scale Transformer-based language models with millions of parameters), we demonstrate that the network's ability to represent new tasks diminishes when the new task distribution differs from the previous task distribution.

Takeaways, Limitations

Takeaways: This paper provides a new perspective on understanding the stability-plasticity dilemma in continuous learning and presents a framework for analyzing the dynamic behavior of neural networks through the effective model capacity (CLEMC). Experimental results on various architectures provide a foundation for quantitatively analyzing the impact of new task distributions.
Limitations: The presented difference equation may be a simplified model and may not fully reflect the complexity of real neural networks. The experimental results may be limited to specific architectures and tasks, and further research is needed to determine their generalizability to a wider range of environments. Further research is needed on the practical application and optimization methods of CLEMC.
👍