Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

Forget Forgetting: Continual Learning in a World of Abundant Memory

Created by
  • Haebom

Author

Dongkyu Cho, Taesup Moon, Rumi Chunara, Kyunghyun Cho, Sungmin Cha

Outline

This paper considers a realistic environment where GPU time constraints outweigh memory constraints in continuous learning (CL). Unlike previous CL research, we explore a "middle ground" where memory is sufficient to mitigate forgetting, but full retraining from scratch is costly. We find that in this environment, models are biased towards previous tasks and struggle to learn new ones, suggesting that plasticity, rather than stability, is a key challenge. Accordingly, we propose Weight Space Consolidation, which combines plasticity recovery via rank-based parameter resets and stability enhancement via weight averaging. It demonstrates superior performance to strong baseline models in continuous directed tuning of image classifiers and large-scale language models while achieving comparable low-computational costs to retraining, offering a scalable alternative.

Takeaways, Limitations

Takeaways:
We present new challenges for CL in memory-rich environments (stability vs. plasticity) and focus on GPU time constraints.
We propose a new lightweight CL method called Weight Space Consolidation (WSC) that achieves superior performance at a lower cost than existing state-of-the-art (SOTA) techniques.
We demonstrate the effectiveness of WSC in class augmentation learning and continuous directed tuning of large-scale language models, demonstrating its applicability to various CL settings.
It challenges the assumptions of existing CL research and presents new benchmarks applicable to real-world environments.
Limitations:
Further research may be needed to evaluate the generalization performance of the method proposed in the paper.
Additional information may be required regarding the specific hyperparameter settings and optimization process of WSC.
Quantitative analysis of the GPU time saving effect may be lacking.
👍