This paper presents Comp-Comp, a benchmarking framework for domain-specific evaluation of large-scale language models (LLMs). Unlike existing large-scale data-based benchmarking methods, Comp-Comp accurately and efficiently evaluates domain-wide aspects based on comprehensiveness and parsimony. Comprehension enhances semantic recall, while parsimony reduces redundancy and noise, improving precision. Through a case study targeting a university, this paper demonstrates the process of developing PolyBench, a high-quality, large-scale academic benchmark, using Comp-Comp. This demonstrates the applicability of the Comp-Comp framework to a wide range of fields.