Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Zeroth-Order Fine-Tuning of LLMs in Random Subspaces

Created by
  • Haebom

Author

Ziming Yu, Pan Zhou, Sike Wang, Jia Li, Mi Tian, Hua Huang

Outline

In this paper, we propose SubZero, a memory-efficient optimization method for fine-tuning large-scale language models (LLMs). Existing zeroth-order optimization methods have a problem that the variance of gradient estimation increases linearly with the dimension of the model, and SubZero solves this problem by using low-dimensional perturbation. SubZero improves training performance while reducing memory consumption, and converges faster than existing zeroth-order optimization methods. Through experimental results, we verify the superiority of SubZero on various language modeling tasks, and we disclose the source code.

Takeaways, Limitations

Takeaways:
A novel memory-efficient optimization technique for fine-tuning large-scale language models
Solving the high gradient estimation variance problem of existing zero-order optimization methods, which is Limitations
Achieving improved training performance and faster convergence speed
Validation of effectiveness in real language modeling tasks and disclosure of source code
Limitations:
Further research is needed to determine the generalizability of the proposed method to structures or sizes other than the general LLM structure and size.
Further research is needed on various hyperparameter tuning and optimization.
More extensive experiments are needed, as the experimental results may be limited to specific datasets and tasks.
👍