Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Analytic Subspace Routing: How Recursive Least Squares Works in Continual Learning of Large Language Model

Created by
  • Haebom

Author

Kai Tong, Kang Pan, Xiao Zhang, Erli Meng, Run He, Yawen Cui, Nuoyan Guo, Huiping Zhuang

Outline

In this paper, we propose an analytic subspace routing (ASR) technique to solve the continuous learning (CL) problem of large-scale language models (LLMs). Existing continuous learning techniques have the problem of reusing previous data, incurring additional computational costs, or using single-parameter efficiency modules, which limits the absorption of new knowledge. ASR separates learning within the subspace of deep layer features for each task, thereby eliminating knowledge interference between tasks. In addition, it efficiently utilizes the knowledge learned in various subspaces through an analytic routing mechanism. It learns a multi-task router model using the recursive least squares method, allowing the router to dynamically adapt to incoming data without accessing past data, assigning the current task to an appropriate subspace, and guaranteeing the non-forgetting property for previously learned tasks. Experimental results show that ASR effectively overcomes the limitations of existing methods by seamlessly integrating new information while maintaining previous knowledge almost perfectly.

Takeaways, Limitations

Takeaways:
Presenting an effective solution to the continuous learning problem of LLM: Solving the problems of increasing computational cost and knowledge interference of existing methods Limitations.
Validating the superiority of the analytic subspace routing (ASR) technique: Experimentally demonstrating near-perfect retention of prior knowledge and smooth integration of new information.
Efficient utilization of multi-task router models: dynamically adapting without accessing past data and ensuring non-forgetting properties.
Limitations:
Timing of code release: After paper acceptance: There are limitations to immediate reproducibility verification.
Lack of detailed description of subspace allocation strategy for specific tasks: Additional analysis may be needed to determine which aspects may impact the performance of ASR.
Generalizability to different LLM architectures and tasks needs to be verified: Experimental results in a limited environment do not guarantee performance in other environments.
👍