This paper proposes LaDiR (Latent Diffusion Reasoner), a novel inference framework that combines the expressive power of continuous latent representations with the iterative refinement of the latent diffusion model to enhance the inference performance of large-scale language models (LLMs). LaDiR constructs a structured latent inference space using a Variational Autoencoder (VAE), which preserves semantic information while providing a concise representation for text inference. The latent diffusion model is then utilized to denoise blocks of latent thought tokens, and a block-by-block bidirectional attention mask enables iterative refinement through a long-term horizon and adaptive runtime computation. LaDiR presents a novel paradigm that improves accuracy, diversity, and interpretability over existing autoregressive, diffusion-based, and latent inference methods on mathematical inference and planning benchmarks.