This is a page that curates AI-related papers published worldwide. All content here is summarized using Google Gemini and operated on a non-profit basis. Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.
Delta Activations: A Representation for Finetuned Large Language Models
Created by
Haebom
Author
Zhiqiu Xu, Amish Sethi, Mayur Naik, Ser-Nam Lim
Outline
This paper highlights the emergence of powerful open-source LLMs, which have successfully generated a vast collection of post-trained large-scale language models (LLMs) adapted to diverse tasks and domains. However, inconsistent metadata and unstructured repositories hinder exploration and understanding of these models. We propose Delta Activations, a method for representing fine-tuned models as vector embeddings by measuring the change in internal activation relative to the base model. This representation allows for effective clustering across domains and tasks, revealing the structure of the model landscape. Delta Activations exhibit desirable properties, including robustness to fine-tuning settings and additive properties when fine-tuning datasets are mixed. Furthermore, Delta Activations can embed tasks across multiple shots of fine-tuning, demonstrating additional potential for model selection and merging. We hope that Delta Activations will facilitate the reuse of publicly available models. The code can be found at https://github.com/OscarXZQ/delta_activations .
We present delta activation, a novel method for effectively representing and comparing fine-tuned LLMs.
◦
Cluster LLMs by domain and task to facilitate model exploration and understanding.
◦
It suggests potential applications in model selection and merging.
◦
It can promote the reuse of publicly available LLMs.
•
Limitations:
◦
Further research is needed to determine how well the performance of delta activation generalizes across different LLM architectures and fine-tuning settings.
◦
Further analysis is needed to determine the interpretability and reliability of delta activation for specific tasks or domains.
◦
Further evaluation of the scalability and computational cost of the proposed method is required.