Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

Gradient-Sign Masking for Task Vector Transport Across Pre-Trained Models

Created by
  • Haebom

Author

Filippo Rinaldi, Aniello Panariello, Giacomo Salici, Fengyuan Liu, Marco Ciccone, Angelo Porrello, Simone Calderara

Outline

This paper presents an efficient method for reusing previously trained tasks in new Foundation model releases. To address the issue of parameter space mismatch between models when reusing existing parameter changes (task vectors), we focus on the gradient code structure of the new model. We propose a novel method, GradFix, which approximates the ideal gradient code structure using only a small number of labeled samples and uses this to transfer knowledge. GradFix adapts by computing a few gradients in the target model and masking the source task vectors, without additional fine-tuning. This effectively rebases the task vectors onto the new pretraining by generating updates locally aligned to the target loss gradient. Theoretically, our method guarantees first-order descent and demonstrates performance improvements that outperform existing methods on vision and language benchmarks.

Takeaways, Limitations

Takeaways:
Increased knowledge transfer efficiency between foundation model releases: Reduces the need for repeated fine-tuning, saving time and resources.
Small sample learning: Achieving high performance even with small amounts of data.
No additional fine-tuning required: fast and efficient knowledge transfer.
Theoretical guarantee: The stability and reliability of the method are ensured through the first-order descent guarantee.
Performance excellence across a variety of benchmarks: Consistent performance gains in vision and language domains.
Limitations:
Lack of diversity in experimental data and models: Need for validation on a wider range of data and models.
Scalability in complex model structures: Further research is needed to determine whether GradFix's performance is maintained in complex model structures.
Approximation of gradient code structure: The ideal gradient code structure may not be perfectly implemented.
Optimal Hyperparameter Settings: Further research is needed on hyperparameter settings to maximize the performance of GradFix.
👍