This paper presents a novel approach to address the high cost of prompt-based inference methods that use additional computations in the inference process to improve the inference performance of large-scale language models (LLMs). Based on a computational model of meta-inference used in cognitive science, we propose a method to train LLMs to selectively use intermediate inference steps only when necessary. We develop a reward function that includes a penalty for unnecessary inferences, and train LLMs using it together with expert iteration. Experimental results show that the proposed method achieves 20-37% reduction in token generation for three models compared to the conventional few-shot chain-of-thought prompting and STaR, while maintaining task performance on a variety of datasets.