This paper provides a rigorous theoretical foundation for task arithmetic, a powerful technique for merging multiple fine-tuned models. Despite the empirical success of existing task arithmetic, a clear theoretical explanation of its effectiveness and applicable conditions has been lacking. This paper addresses this issue by establishing a relationship between the task vector and the gradient of the task loss. Under standard gradient descent, the task vector generated by fine-tuning in a single epoch is exactly equal to the negative gradient of the loss multiplied by the learning rate. This is approximately the same in a multi-epoch setting, and we demonstrate that the error can be explicitly bounded for feedforward networks. Experimental analysis on seven vision benchmarks demonstrates that the gradient of the first epoch dominates the fine-tuning trajectory in both norm and direction. This suggests that merging models fine-tuned in a single epoch can achieve performance comparable to that of fully converged models. In conclusion, this study reframes task arithmetic as a form of approximate multi-task learning, providing clear evidence for its effectiveness and highlighting the important role of early training dynamics in model merging.