To address the challenges of visual imitation learning (VML), which involves learning from long-term demonstrations with complex action sequences, this paper proposes a novel agent framework that integrates two reflection modules to enhance planning and code generation capabilities. The framework ensures temporal consistency and spatial alignment of action sequences through the plan generation module and the plan reflection module, while the code generation module and the code reflection module verify and improve the accuracy and consistency of the generated code with the plan. Furthermore, we introduce LongVILBench, a new benchmark that includes an 18-step action sequence that emphasizes temporal and spatial complexity, to support systematic evaluation. Experimental results demonstrate that the proposed framework outperforms existing methods.