[공지사항]을 빙자한 안부와 근황 
Show more

Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

To Code or not to Code? Adaptive Tool Integration for Math Language Models via Expectation-Maximization

Created by
  • Haebom

Author

Haozhe Wang, Long Li, Chao Qu, Fengming Zhu, Weidi Xu, Wei Chu, Fangzhen Lin

Outline

This paper focuses on recent advances in mathematical problem solving using language models (LMs). In particular, we point out the limitations of hybrid frameworks that integrate CoT inference and code execution to leverage the strengths of each. We address the issue that existing frameworks rely on external instructions or fixed code integration templates and lack metacognitive awareness—that is, the ability to dynamically assess intrinsic abilities and autonomously decide when and how to integrate tools. To address these limitations, we study autonomous code integration, which allows models to adapt their tool usage strategies as their inference abilities evolve during learning. We address the efficiency issues of reinforcement learning (RL) and propose a novel expectation-maximization (EM) framework that combines structured exploration (E-step) and off-policy RL optimization (M-step). This framework creates a mutually reinforcing loop between metacognitive tool usage decisions and evolving abilities. Experimental results show that the proposed method achieves superior results through enhanced exploration. In particular, the 7B model showed an improved performance of more than 11% on MATH500 and 9.4% on AIME without o1-like CoT.

Takeaways, Limitations

Takeaways: A novel EM framework for autonomous code integration is presented, and it is experimentally demonstrated that it overcomes the limitations of existing reinforcement learning-based approaches and shows improved performance. The efficiency of solving mathematical problems is enhanced through a language model with metacognitive abilities.
Limitations: Further research is needed on the generality and scalability of the proposed EM framework. The applicability to various types of problems and tools should be further verified. The current performance improvement is limited to specific datasets (MATH500, AIME), and further research is needed on the generalizability to other domains.
👍