This paper addresses the issue that large-scale language model (LLM)-based task planning for implemented AI and its corresponding human demonstration can degrade policy quality due to unnecessary actions, redundant exploration, and logical errors. To address this, we propose an iterative validation framework in which a judgment LLM critiques action sequences and a planning LLM applies corrections. Unlike rule-based approaches, this method relies on natural language prompting, enabling broad generalization across a variety of error types, including irrelevant actions, contradictions, and missing steps. On a manually annotated action set from the TEACh implementation AI dataset, the proposed framework achieves up to 90% recall and 100% precision on four state-of-the-art LLMs (GPT-4-mini, DeepSeek-R1, Gemini 2.5, and LLaMA 4 Scout). The refined loop converges quickly, with 96.5% of sequences sufficient for at most three iterations, improving both time efficiency and spatial action composition. Importantly, this method preserves human error recovery patterns, supporting future research on robust corrective behavior. By establishing plan verification as a reliable LLM function for spatial planning and behavior improvement, this study provides a scalable path to high-quality training data for imitation learning in implemented AI.