This paper discusses recent advances in multimodal large-scale language models (MLLMs), which enable rich perceptual evidence for code policy generation in embodied agents. Most existing systems lack effective mechanisms for adaptively monitoring policy execution and recovering code during task completion. This study introduces HyCodePolicy, a hybrid language-based control framework that systematically integrates code synthesis, geometric evidence, perceptual monitoring, and iterative recovery into the closed-loop programming cycle of embodied agents. Given a natural language instruction, the system first decomposes it into subgoals and generates an initial executable program based on object-oriented geometric primitives. Then, while the program executes in simulation, a vision-language model (VLM) observes selected checkpoints to detect, localize, and infer the cause of execution failures. By integrating structured execution traces that capture program-level events with VLM-based perceptual feedback, HyCodePolicy infers the cause of failures and recovers the program. This hybrid dual-feedback mechanism enables self-correcting program synthesis with minimal human supervision. Experimental results demonstrate that HyCodePolicy significantly improves the robustness and sample efficiency of robot manipulation policies, providing a scalable strategy for integrating multimodal inference into autonomous decision-making pipelines.