Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

Exploring Model Kinship for Merging Large Language Models

Created by
  • Haebom

Author

Yedi Hu, Yunzhi Yao, Ningyu Zhang, Huajun Chen, Shumin Deng

Outline

This paper explores model merging, a key technique for improving the performance and efficiency of large-scale language models (LLMs). While the open-source community has repeatedly merged existing models to drive model evolution, a systematic understanding of the benefits and underlying factors of model merging remains lacking. Drawing analogies to biological evolution, this study explores model evolution through iterative merging and introduces the concept of "model kinship," which represents the degree of similarity or relatedness between LLMs. Empirical analysis demonstrates that model kinship is closely related to the performance gains from merging, providing a useful criterion for selecting candidate models. Based on these insights, we propose "Top-k Greedy Merge with Model Kinship Consideration," a novel model merging strategy that uses model kinship as a guide to mitigate performance degradation and promote effective model evolution.

Takeaways, Limitations

Takeaways:
By revealing that model kinship is closely related to the performance improvement of model merging, we provide important guidance for developing effective model merging strategies.
We propose a new model merging strategy, 'Top-k greedy merging with model kinship', which achieves improved performance over existing methods by considering model kinship.
We offer a new perspective on the model merging process through analogies with biological evolution.
Limitations:
Further research may be needed to define and measure the proposed model kinship relationship.
Further validation of the generalizability across different types of LLMs and merger strategies is needed.
The computational cost of calculating model kinship can be significant.
👍