Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

DaMoC: Efficiently Selecting the Optimal Large Language Model for Fine-tuning Domain Tasks Based on Data and Model Compression

Created by
  • Haebom

Author

Wei Huang, Huang Wei, Yinggui Wang

Outline

This paper proposes a data and model compression framework (DaMoC) that addresses the problem of rapidly selecting the optimal model among numerous open-source large-scale language models (LLMs) for fine-tuning for specific domain tasks. DaMoC consists of two aspects: data and model levels. At the data level, we categorize data filtering methods into three paradigms: distribution-aware, quality-aware, and hybrid approaches. We achieve token compression by increasing the density of key tokens, and we optimize the representation by iteratively rewriting text using LLMs. At the model level, we use hierarchical similarity scores to assess the importance of each layer, pruning layers with low importance, and introduce a sparse merge paradigm to maximize the preservation of the original model's features. Through extensive experiments on four datasets—medical Q&A, financial Q&A, general Q&A, and reading comprehension—we demonstrate that selecting the optimal LLM reduces training time by approximately 20x.

Takeaways, Limitations

Takeaways:
We provide a framework for efficiently selecting the optimal model for a specific task from among various open-source LLMs.
Dramatically reduces training time for LLM fine-tuning (approximately 20x) through data and model compression.
We systematically categorize data filtering methodologies and present effective strategies for LLM fine-tuning.
Limitations:
The performance of the proposed framework may depend on the dataset and task used. Additional experiments on a variety of datasets and tasks are required.
The lack of a detailed description of the specific methodology of the "sparse merge paradigm" necessitates a review of reproducibility.
Further research is needed to verify whether the 20x training time reduction effect is consistent across all cases.
👍