This is a page that curates AI-related papers published worldwide. All content here is summarized using Google Gemini and operated on a non-profit basis. Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.
This paper explores a model merging technique that integrates multiple fine-tuned models into a single multi-task model to address the generalization degradation that occurs when fine-tuning a pre-trained model on a specific dataset. Existing model merging methods fail to consider the roles, connectivity, and activation of neurons, leading to performance degradation due to task interference. This study presents NeuroMerging, a novel model merging framework based on neuronal mechanisms. NeuroMerging decomposes task-specific representations into two complementary neuronal subspaces that regulate input sensitivity and task adaptability, mitigating task interference and merging models across diverse tasks without training. We experimentally demonstrate that our approach outperforms existing methods on multi-task benchmarks in natural language and vision domains. This highlights the importance of aligning neuronal mechanisms in model merging and provides new insights into mitigating task interference and improving knowledge fusion.
Takeaways, Limitations
•
Takeaways:
◦
We present NeuroMerging, a novel model merging framework that considers neuron mechanisms.
◦
Effectively alleviates the task interference problem of existing model merging methods, which is Limitations
◦
Achieving superior performance over existing methods in natural language and vision domains.
◦
Highlights the importance of neuron mechanisms in model merging and provides new insights into improving knowledge fusion.
•
Limitations:
◦
Further validation is needed to ensure that the performance improvements of NeuroMerging are consistent across all multi-task benchmarks.
◦
The need to evaluate generalization performance across various model architectures and datasets.
◦
Analysis of the computational cost and complexity of NeuroMerging is needed.