This paper presents a framework for efficient implicit inference of large-scale language models (LLMs). The conventional CoT prompting method has the disadvantage of being computationally expensive and slow, so in this paper, we propose a method to infer in the latent space without explicitly generating the computational process as text. To this end, we model the latent thought process as an abstract action (option) extended in time within a hierarchical reinforcement learning framework, and learn various options as latent embeddings using the variational Markov option critic (VMOC) algorithm. We extend the theory of continuous MDP isomorphism to prove that policy learning in the latent space preserves the optimal solution of the original complex problem, and propose a cold start procedure that distills human reasoning demonstrations into the latent option space using supervised fine-tuning (SFT) data. Experimental results on complex logical reasoning benchmarks and movement tasks demonstrate the effectiveness of the proposed framework.