This paper proposes LETToT (Label-Free Evaluation of LLM on Tourism using Expert Tree-of-Thought), a label-free LLM evaluation framework that leverages expert-derived inference structures to address the challenges of evaluating large-scale language models (LLMs) in specific domains such as tourism, particularly the high cost of annotated benchmarks and persistent issues such as hallucinations. LETToT iteratively refines and validates hierarchical ToT components using common quality dimensions and expert feedback. Experimental results show that systematically optimized expert ToTs achieve relative quality improvements of 4.99-14.15% compared to baselines. Furthermore, we evaluate models of various sizes (32B-671B parameters) and confirm that the scaling law holds even in specific domains (DeepSeek-V3 excels), while smaller models with enhanced inference (e.g., DeepSeek-R1-Distill-Llama-70B) close this gap. For models with less than 72B, the explicit inference architecture demonstrated superior accuracy and parsimoniousness (p<0.05). This study establishes a scalable, label-free paradigm for domain-specific LLM evaluation, offering a compelling alternative to existing annotated benchmarks.