In this paper, we present a Meta-Reasoner framework to address the computational overhead and error propagation issues that arise during the complex problem-solving process of large-scale language models (LLMs). Inspired by human metacognition and dual-process theory, Meta-Reasoner dynamically optimizes inference-time reasoning by decoupling high-level guidance from step-by-step generation and letting LLMs think about ‘how to think’. It uses a contextual multi-armed bandit to iteratively evaluate the progress of inference, select an optimal strategy (e.g., backtrack, disambiguate, start over, or suggest an alternative), and reallocate computational resources to the most promising path. Through evaluations on mathematical reasoning and puzzles, we demonstrate that the dynamic inference chain has the potential to overcome the inherent challenges of LLM inference processes and provide a scalable and adaptable solution for a wide range of applications.