Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Soft-Transformers for Continual Learning

Created by
  • Haebom

Author

Haeyong Kang, Chang D. Yoo

Outline

Inspired by the Well-initialized Lottery Ticket Hypothesis (WLTH), this paper proposes Soft-Transformers (Soft-TF), a novel fully fine-tuned continuous learning (CL) method that sequentially trains and selects optimal soft networks for each task. Soft-TF maintains the parameters of pre-trained layers fixed during continuous learning, while optimizing the weights of sparse layers using well-initialized Soft-TF masks to obtain task-adaptive soft (real-valued) networks. During inference, the identified task-adaptive network masks the parameters of the pre-trained network to map it to the optimal solution for each task, minimizing catastrophic forgetting (CF). Soft-masking preserves the knowledge of the pre-trained network. Extensive experiments on the Vision Transformer (ViT) and Language Transformer (Bert) demonstrate the effectiveness of Soft-TF, achieving state-of-the-art performance in vision and language class incremental learning (CIL) scenarios.

Takeaways, Limitations

Takeaways:
A novel method for effectively applying the Well-initialized Lottery Ticket Hypothesis to continuous learning is presented.
Effectively solving the fatal forgetting problem using task-adaptive soft networks.
Experiments with ViT and Bert demonstrate state-of-the-art performance in both vision and language domains.
Effectively preserving the knowledge of pre-trained networks through soft masking techniques.
Limitations:
Lack of analysis of the computational cost and complexity of the proposed method.
Further validation of generalization performance across diverse datasets and tasks is needed.
A more detailed explanation of the optimization strategy for Soft-TF masks is needed.
Possible dependencies on specific architectures (ViT, Bert).
👍