This paper addresses the issue that large-scale language model (LLM)-based task planning for implemented AI and its corresponding human demonstration can degrade policy quality due to unnecessary actions, redundant exploration, and logical errors. To address this, we propose an iterative validation framework in which the judgment LLM critiques action sequences and the planning LLM applies corrections. This produces progressively cleaner and spatially consistent trajectories. Unlike rule-based approaches, it relies on natural language prompting, enabling broad generalization across a variety of error types, including irrelevant actions, contradictions, and missing steps. On a manually annotated action set from the TEACh implementation AI dataset, the proposed framework achieves up to 90% recall and 100% precision against four state-of-the-art LLMs (GPT-4-mini, DeepSeek-R1, Gemini 2.5, and LLaMA 4 Scout). The refinement loop converges rapidly, with 96.5% of sequences requiring only three iterations, improving both time efficiency and spatial action composition. Importantly, this method supports future research on robust correction behaviors by preserving human error recovery patterns without disrupting them. By establishing plan validation as a reliable LLM function for spatial planning and behavior improvement, it provides a scalable path to high-quality training data for imitation learning in implemented AI.