Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

On Task Vectors and Gradients

Created by
  • Haebom

Author

Luca Zhou, Daniele Solombrino, Donato Crisostomi, Maria Sofia Bucarelli, Giuseppe Alessio D'Inverno, Fabrizio Silvestri, Emanuele Rodol a

Outline

This paper provides a rigorous theoretical foundation for task arithmetic, a powerful technique for merging multiple fine-tuned models. Despite the empirical success of existing task arithmetic, a clear theoretical explanation of its effectiveness and applicable conditions has been lacking. This paper addresses this issue by establishing a relationship between the task vector and the gradient of the task loss. Under standard gradient descent, the task vector generated by fine-tuning in a single epoch is exactly equal to the negative gradient of the loss multiplied by the learning rate. This is approximately the same in a multi-epoch setting, and we demonstrate that the error can be explicitly bounded for feedforward networks. Experimental analysis on seven vision benchmarks demonstrates that the gradient of the first epoch dominates the fine-tuning trajectory in both norm and direction. This suggests that merging models fine-tuned in a single epoch can achieve performance comparable to that of fully converged models. In conclusion, this study reframes task arithmetic as a form of approximate multi-task learning, providing clear evidence for its effectiveness and highlighting the important role of early training dynamics in model merging.

Takeaways, Limitations

Takeaways:
Provides a theoretical basis for the effectiveness of task arithmetic.
Clarifies the relationship between task vector and gradient.
We show that high performance can be achieved by merging single-epoch fine-tuned models.
Reinterpreting task arithmetic as approximate multi-task learning.
Emphasizes the importance of early training dynamics.
Limitations:
Theoretical analysis has primarily focused on feedforward networks. Generalization to other network structures requires further research.
The bound on the approximation error in multi-epoch settings may vary depending on the network architecture and hyperparameters.
The experimental analysis was limited to vision benchmarks. Generalizability to other domains requires further validation.
👍