Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

THOR: Tool-Integrated Hierarchical Optimization via RL for Mathematical Reasoning

Created by
  • Haebom

Author

Qikai Chang, Zhenrong Zhang, Pengfei Hu, Jun Du, Jiefeng Ma, Yicheng Pan, Jianshu Zhang, Quan Liu, Jianqing Gao

Outline

Although large-scale language models (LLMs) have made remarkable progress in mathematical inference, they still struggle with high-precision tasks such as numerical computation and formal symbolic manipulation. Integrating external tools has emerged as a promising approach to address this gap. Existing methods face three major challenges: constructing tool-integrated inference data, performing fine-tuning optimization, and improving inference. To overcome these limitations, we propose Tool-Integrated Hierarchical Optimization via RL (THOR). First, we build a high-quality tool-integrated inference path dataset using TIRGen, aligning and generalizing policies across diverse models. Second, we introduce an RL strategy that jointly optimizes episode-level problem solving and step-by-step code generation to perform fine-tuning hierarchical optimization. This is based on the core insight that the success of intermediate tool invocations is a strong predictor of the accuracy of the final solution. Finally, THOR incorporates a self-correction mechanism that utilizes immediate tool feedback to dynamically correct erroneous inference paths during the inference process. THOR demonstrates strong generalization across a wide range of models and operates effectively on both inference and non-inference models. It also achieves state-of-the-art performance on models of similar scale across multiple mathematical benchmarks and consistently delivers improvements across code benchmarks.

Takeaways, Limitations

THOR addresses three key challenges: tool-integrated inference, fine-tuning optimization, and inference improvement.
Build a high-quality tool-integrated inference dataset with TIRGen.
We perform episode- and step-level optimization using RL strategies.
Dynamically corrects inference errors through a self-correcting mechanism.
It shows strong generalization performance across a variety of models.
Achieve SOTA performance in math and code benchmarks.
Code to be released soon ( https://github.com/JingMog/THOR ).
The specific Limitations of the paper is not specified. (However, Limitations of the existing method is mentioned.)
👍