Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Fine, I'll Merge It Myself: A Multi-Fidelity Framework for Automated Model Merging

Created by
  • Haebom

Author

Guinan Su, Jonas Geiping

Outline

In this paper, we propose a model merging framework as an efficient method for improving the inference capability of large-scale language models (LLMs). Existing model merging methods rely on manual strategies for hyperparameter tuning, which limits the exploration of potential model combinations and requires a lot of effort. In this paper, we present an automated model merging framework that enables fine-tuned exploration of merging strategies while reducing costs through multi-fidelity approximations. It supports single- and multi-objective optimizations and introduces two new search spaces: layer-wise fusion (LFS) and depth-wise integration (DIS). Evaluation results on various benchmarks show that the proposed framework autonomously finds merges that further improve single-objective performance even on tasks where the model is already fine-tuned, and merges that optimize the multi-objective frontier across multiple tasks. Effective merges can be found even with limited computational resources (e.g., less than 500 exploration steps).

Takeaways, Limitations

Takeaways:
An efficient model merging framework for improving the inference ability of large-scale language models is presented.
Overcoming the limitations of manual hyperparameter tuning.
Cost reduction through multi-fidelity approximation.
Supports single and multi-objective optimization.
Introducing a new search space (LFS, DIS).
Proving the feasibility of effective model merging discovery even with limited computational resources.
Presenting the possibility of merging new models that surpass the performance of existing models.
Limitations:
Further research is needed on the generalization performance of the proposed framework.
Need to validate applicability to different types of LLMs and jobs.
Further analysis is needed on the accuracy and efficiency of multi-fidelity approximation.
There is a need to verify whether performance improvements for a specific benchmark are applicable to other benchmarks.
👍