Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

Spectral Collapse Drives Loss of Plasticity in Deep Continual Learning

Created by
  • Haebom

Author

Naicheng He, Kaicheng Guo, Arjun Prakash, Saket Tiwari, Ruo Yu Tao, Tyrone Serapio, Amy Greenwald, George Konidaris

Outline

This paper investigates why deep neural networks in deep continuous learning fail to learn new tasks without parameter reinitialization, namely, plasticity loss. We find that this failure is preceded by Hessian spectral collapse during new task initialization. This phenomenon causes meaningful curvature directions to disappear and gradient descent to become inefficient. To characterize the necessary conditions for successful learning, we introduce the concept of $\tau$-trainability and demonstrate that current plasticity-preserving algorithms can be integrated within this framework. By directly targeting spectral collapse through an approximation of the Kronecker decomposition of the Hessian, we propose two regularization enhancements: maintaining high effective feature rank and applying an L2 penalty. Experiments on continuous supervised learning and reinforcement learning tasks demonstrate that combining these two regularization methods effectively preserves plasticity.

Takeaways, Limitations

Takeaways:
A new understanding of the plasticity loss problem in deep continuous learning: Hessian spectral collapse is the main cause.
Provides a framework for integrating plasticity-preserving algorithms through the concept of $\tau$-trainability.
A practical methodology for preserving plasticity via Hessian-based regularization: Maintaining high effective feature ranks and L2 penalty.
Demonstrating the effectiveness of the proposed methodology in continuous supervised learning and reinforcement learning tasks.
Limitations:
Computational complexity of Hessian approximation and regularization methods.
Further research is needed to determine optimal hyperparameter settings.
Generalizability to other continuous learning scenarios needs to be examined.
👍