This paper focuses on recent advances in mathematical problem solving using language models (LMs). In particular, we point out the limitations of hybrid frameworks that integrate CoT inference and code execution to leverage the strengths of each. We address the issue that existing frameworks rely on external instructions or fixed code integration templates and lack metacognitive awareness—that is, the ability to dynamically assess intrinsic abilities and autonomously decide when and how to integrate tools. To address these limitations, we study autonomous code integration, which allows models to adapt their tool usage strategies as their inference abilities evolve during learning. We address the efficiency issues of reinforcement learning (RL) and propose a novel expectation-maximization (EM) framework that combines structured exploration (E-step) and off-policy RL optimization (M-step). This framework creates a mutually reinforcing loop between metacognitive tool usage decisions and evolving abilities. Experimental results show that the proposed method achieves superior results through enhanced exploration. In particular, the 7B model showed an improved performance of more than 11% on MATH500 and 9.4% on AIME without o1-like CoT.