Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Rewarding Graph Reasoning Process makes LLMs more Generalized Reasoners

Created by
  • Haebom

Author

Miao Peng, Nuo Chen, Zongrui Suo, Jia Li

Outline

In this paper, we present a study on applying the Process Reward Model (PRM) to graph inference problems to improve the inference ability of large-scale language models (LLMs). To address the high cost of manually generating step-by-step supervision data, we construct a large-scale graph inference dataset called GraphSILO, which generates detailed inference steps and step-by-step labels using task-oriented trajectories and Monte Carlo tree search (MCTS). Based on this dataset, we train GraphPRM, the first PRM for graph inference problems, and evaluate its effectiveness in reinforcement learning settings with inference time extension and direct preference optimization (DPO). Experimental results show that GraphPRM significantly improves LLM performance on 13 graph inference tasks, especially with a 9% performance improvement on the Qwen2.5-7B model. In addition, we demonstrate transferability to new graph inference datasets and new inference domains such as math problem solving. The performance improvements on GSM8K and Math500 highlight the cross-domain applicability of graph-based inference rewards.

Takeaways, Limitations

Takeaways:
We show that applying PRM to graph inference problems can improve the inference ability of LLM.
We demonstrate that automated data generation methods can be used to build large-scale, high-quality graph inference datasets.
We demonstrate that GraphPRM is effective in improving LLM performance in various graph inference tasks and other domains (e.g. mathematical problem solving).
Verifying the cross-domain transferability of graph-based inference rewards.
Limitations:
Lack of detailed instructions on how to create the GraphSILO dataset.
Lack of detailed description of the types and characteristics of the graph inference tasks used.
Lack of comparative analysis with other PRM-based methods.
Further research is needed to determine the generalizability of the experimental results.
👍