This paper proposes a data and model compression framework (DaMoC) that addresses the problem of rapidly selecting the optimal model among numerous open-source large-scale language models (LLMs) for fine-tuning for specific domain tasks. DaMoC consists of two aspects: data and model levels. At the data level, we categorize data filtering methods into three paradigms: distribution-aware, quality-aware, and hybrid approaches. We achieve token compression by increasing the density of key tokens, and we optimize the representation by iteratively rewriting text using LLMs. At the model level, we use hierarchical similarity scores to assess the importance of each layer, pruning layers with low importance, and introduce a sparse merge paradigm to maximize the preservation of the original model's features. Through extensive experiments on four datasets—medical Q&A, financial Q&A, general Q&A, and reading comprehension—we demonstrate that selecting the optimal LLM reduces training time by approximately 20x.