Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

To See a World in a Spark of Neuron: Disentangling Multi-task Interference for Training-free Model Merging

Created by
  • Haebom

Author

Zitao Fang, Guodong DU, Shuyang Yu, Yifei Guo, Yiwei Zhang, Yiyao Cao, Jing Li, Ho-Kin Tang, Sim Kuan Goh

Outline

This paper explores a model merging technique that integrates multiple fine-tuned models into a single multi-task model to address the generalization degradation that occurs when fine-tuning a pre-trained model on a specific dataset. Existing model merging methods fail to consider the roles, connectivity, and activation of neurons, leading to performance degradation due to task interference. This study presents NeuroMerging, a novel model merging framework based on neuronal mechanisms. NeuroMerging decomposes task-specific representations into two complementary neuronal subspaces that regulate input sensitivity and task adaptability, mitigating task interference and merging models across diverse tasks without training. We experimentally demonstrate that our approach outperforms existing methods on multi-task benchmarks in natural language and vision domains. This highlights the importance of aligning neuronal mechanisms in model merging and provides new insights into mitigating task interference and improving knowledge fusion.

Takeaways, Limitations

Takeaways:
We present NeuroMerging, a novel model merging framework that considers neuron mechanisms.
Effectively alleviates the task interference problem of existing model merging methods, which is Limitations
Achieving superior performance over existing methods in natural language and vision domains.
Highlights the importance of neuron mechanisms in model merging and provides new insights into improving knowledge fusion.
Limitations:
Further validation is needed to ensure that the performance improvements of NeuroMerging are consistent across all multi-task benchmarks.
The need to evaluate generalization performance across various model architectures and datasets.
Analysis of the computational cost and complexity of NeuroMerging is needed.
👍