Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

Kron-LoRA: Hybrid Kronecker-LoRA Adapters for Scalable, Sustainable Fine-tuning

Created by
  • Haebom

Author

Yixin Shen

Outline

This paper presents a method for fine-tuning large-scale pre-trained language models across multiple tasks using an adapter that combines parameter efficiency and expressive power. We introduce Kron-LoRA, a novel hybrid adapter that combines Kronecker decomposition with traditional low-rank LoRA compression. Kron-LoRA uses up to four times fewer parameters than standard LoRA while maintaining similar expressive power. Experiments on eight benchmarks targeting DistilBERT, Mistral-7B, LLaMA-2-7B, and LLaMA-3-8B demonstrate that Kron-LoRA performs on par with or better than the LoRA baseline model, with a lower memory footprint and a speed overhead of only 5-8%. Even with sequential fine-tuning, it achieves competitive cross-task transfer performance while using only a quarter of the adapter's parameters. Therefore, Kron-LoRA offers a scalable and sustainable solution for multi-task adaptation of large-scale language models.

Takeaways, Limitations

Takeaways:
Kron-LoRA achieves similar performance with up to 4x fewer parameters than conventional LoRA, enabling parameter-efficient fine-tuning.
It demonstrates competitive performance compared to LoRA across various models and benchmarks.
It is also effective in sequential fine-tuning and supports resource-efficient multi-task adaptation.
We provide a practical solution for sustainable multi-task adaptation of large-scale language models.
Limitations:
There is a speed overhead of 5-8%.
Further research is needed to determine whether the presented experimental results can be generalized to all types of models and tasks.
We cannot rule out the possibility that the performance improvements of Kron-LoRA are biased towards specific tasks or models.
👍