$E^2C$ is a framework designed to improve the inference capability of large-scale language models (LLMs). It separates inference into an exploration phase, which generates high-level strategic plans, and an execution phase, which executes the selected plan. This framework uses a dual-stage training approach that combines supervised fine-tuning (SFT) and reinforcement learning (RL), and integrates a novel data generation algorithm into SFT that enhances plan adherence. $E^2C$ achieves high accuracy using fewer tokens than other methods on AIME'2024, and outperforms SFT on medical benchmarks by enhancing cross-domain adaptability through exploration-focused SFT (EF-SFT).