Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

An Efficient and Precise Training Data Construction Framework for Process-supervised Reward Model in Mathematical Reasoning

Created by
  • Haebom

Author

Wei Sun, Qianlong Du, Fuwei Cui, Jiajun Zhang

Outline

In this paper, we propose EpicPRM, a novel framework for improving the mathematical inference capability of large-scale language models (LLMs). Existing data generation methods for learning process-supervised reward models (PRMs) have limitations in that they are expensive or of low quality, such as manual annotation or step-by-step Monte Carlo estimation. EpicPRM quantifies the contribution of each intermediate inference step and annotates it, and improves the accuracy and efficiency of the annotation using an adaptive binary search algorithm. Through this, we efficiently build a high-quality process-supervised learning dataset, Epic50k, which consists of 50,000 annotated intermediate steps. PRMs trained with Epic50k show significantly better performance than those using existing public datasets. Epic50k is available on GitHub.

Takeaways, Limitations

Takeaways:
A new framework (EpicPRM) that effectively improves mathematical reasoning ability in LLM is presented
Building and releasing a high-quality process supervised learning dataset (Epic50k)
Presenting a more efficient and accurate annotation generation method than existing methods
PRM using Epic50k outperforms existing methods
Limitations:
Further research is needed to determine whether EpicPRM’s performance generalizes to other types of inference problems or to other LLM architectures.
The size of the Epic50k dataset may be relatively small compared to larger datasets.
Further research is needed on optimal parameter settings of adaptive binary search algorithms.
👍