Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

On Task Vectors and Gradients

Created by
  • Haebom

Author

Luca Zhou, Daniele Solombrino, Donato Crisostomi, Maria Sofia Bucarelli, Giuseppe Alessio D'Inverno, Fabrizio Silvestri, Emanuele Rodol a

Theoretical Foundations of Task Arithmetic

Outline

Task arithmetic has emerged as a simple yet powerful model merging technique that combines multiple fine-tuned models into one. This paper establishes the connection between the task vector and the gradient of the task loss to provide a theoretical foundation for task arithmetic. Under standard gradient descent, the task vector generated by one-epoch fine-tuning is shown to be exactly equivalent to the negative gradient of the loss scaled by the learning rate. For multi-epoch settings, we prove that this equivalence approximately holds, using an explicitly bounded quadratic error term for feedforward networks.

Takeaways, Limitations

Takeaways:
Provides clear evidence for why task arithmetic is effective.
Emphasizes the importance of early learning dynamics in model merging.
This suggests that merging models fine-tuned for only one epoch can achieve similar performance as merging fully converged models.
Reframe task arithmetic as a form of approximate multi-task learning.
Limitations:
Limited to the analysis of second-order error terms for feedforward networks (not mentioned in the paper).
Experiments were limited to visual benchmarks (not mentioned in the paper).
👍