This page organizes papers related to artificial intelligence published around the world. This page is summarized using Google Gemini and is operated on a non-profit basis. The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.
Luca Zhou, Daniele Solombrino, Donato Crisostomi, Maria Sofia Bucarelli, Giuseppe Alessio D'Inverno, Fabrizio Silvestri, Emanuele Rodol a
Theoretical Foundations of Task Arithmetic
Outline
Task arithmetic has emerged as a simple yet powerful model merging technique that combines multiple fine-tuned models into one. This paper establishes the connection between the task vector and the gradient of the task loss to provide a theoretical foundation for task arithmetic. Under standard gradient descent, the task vector generated by one-epoch fine-tuning is shown to be exactly equivalent to the negative gradient of the loss scaled by the learning rate. For multi-epoch settings, we prove that this equivalence approximately holds, using an explicitly bounded quadratic error term for feedforward networks.
Takeaways, Limitations
•
Takeaways:
◦
Provides clear evidence for why task arithmetic is effective.
◦
Emphasizes the importance of early learning dynamics in model merging.
◦
This suggests that merging models fine-tuned for only one epoch can achieve similar performance as merging fully converged models.
◦
Reframe task arithmetic as a form of approximate multi-task learning.
•
Limitations:
◦
Limited to the analysis of second-order error terms for feedforward networks (not mentioned in the paper).
◦
Experiments were limited to visual benchmarks (not mentioned in the paper).