This paper presents a novel framework, LETToT (Label-Free Evaluation of LLM on Tourism using Expert Tree-of-Thought), to address the challenges of evaluating large-scale language models (LLMs) in specific domains, such as tourism. This framework overcomes the high cost of annotated benchmarks and persistent issues like hallucinations. LETToT evaluates LLMs in tourism using expert-derived inference structures instead of labeled data. We iteratively refine and validate hierarchical ToT components through alignment with common quality dimensions and expert feedback, and use the optimized expert ToT to evaluate models of various sizes (ranging from 32B to 671B parameters). Our results demonstrate that while scaling law holds true for specific domains (DeepSeek-V3 excels), smaller models with enhanced inference (e.g., DeepSeek-R1-Distill-Llama-70B) narrow the performance gap. Furthermore, we demonstrate that for models smaller than 72B, the explicit inference architecture outperforms in both accuracy and conciseness (p<0.05). This study establishes a scalable, label-free paradigm for domain-specific LLM evaluation, providing a powerful alternative to existing annotated benchmarks.