Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

Activated LoRA: Fine-tuned LLMs for Intrinsics

Created by
  • Haebom

Author

Kristjan Greenewald, Luis Lastras, Thomas Parnell, Vraj Shah, Lucian Popa, Giulio Zizzo, Chulaka Gunasekara, Ambrish Rawat, David Cox

Outline

Low-Rank Adaptation (LoRA) is an efficient framework for fine-tuning large-scale foundation models and is widely used for data-driven customization of LLM. However, switching between LoRAs in a multi-turn environment incurs an inefficiency because the KV cache of the entire turn history must be recalculated with LoRA weights. To address this issue, this paper proposes Activated LoRA (aLoRA), an adapter architecture that adapts weights only for tokens in a sequence after aLoRA is invoked. This allows aLoRA to utilize the underlying model KV cache of the input string, enabling it to be activated immediately within the chain without recalculating previous keys and values. This allows for the construction of specialized models, called "intrinsics," that are invoked to perform well-defined tasks for specific input chains or segments of a conversation. By training an aLoRA-based intrinsics model, we achieve competitive accuracy with standard LoRA while significantly improving inference efficiency. The aLoRA implementation was contributed to the Huggingface PEFT library.

Takeaways, Limitations

Takeaways:
Proposing an aLoRA architecture to address LoRA's inefficiency in multi-turn environments.
Improved inference efficiency with aLoRA, which can be activated immediately without recalculating the KV cache.
Suggesting the possibility of building an 'intrinsics' model
Demonstrated accuracy and improved inference efficiency comparable to standard LoRA
Contributing an aLoRA implementation to the Huggingface PEFT library
Limitations:
Lack of detailed information on specific experimental results and performance comparisons (since this is a paper summary)
No mention of potential drawbacks or limitations of aLoRA.
👍