Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

HD-PiSSA: High-Rank Distributed Orthogonal Adaptation

Created by
  • Haebom

Author

Yiding Wang, Fauxu Meng, Xuefeng Zhang, Fan Jiang, Pingzhi Tang, Muhan Zhang

Outline

This paper proposes HD-PiSSA (High-rank Distributed PiSSA), a novel distributed Parameter-Efficient Fine-Tuning (PEFT) method for efficient fine-tuning of large-scale language models (LLMs). Existing PEFT methods, such as LoRA and PiSSA, restrict model updates to low-rank subspaces, limiting their expressiveness and preventing optimal performance on complex tasks. HD-PiSSA initializes orthogonal adapters across multiple devices and fine-tunes them by aggregating delta updates to W. HD-PiSSA assigns different principal components of pre-trained weights to each GPU, significantly expanding the range of update directions compared to data-parallel LoRA or PiSSA. Experimental results show that HD-PiSSA outperforms LoRA by an average of 10.0 points and PiSSA by an average of 4.98 points on a variety of demanding downstream tasks, including mathematics, code generation, and multi-task learning.

Takeaways, Limitations

Takeaways:
HD-PiSSA improves the performance of LLM fine-tuning by allowing higher-rank updates than existing methods through a distributed PEFT approach.
It showed particularly notable performance improvements in multi-task learning environments.
Increased flexibility in fine tuning by leveraging more GPU resources.
Limitations:
Because it operates in a distributed environment, GPU resources and a distributed learning environment are required.
A detailed description of the specific implementation and optimization methods of HD-PiSSA may be lacking.
Experimental results may be limited to specific tasks and model architectures, and further research is needed to determine generalizability.
👍