Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

G1: Teaching LLMs to Reason on Graphs with Reinforcement Learning

Created by
  • Haebom

Author

Xiaojun Guo, Ang Li, Yifei Wang, Stefanie Jegelka, Yisen Wang

Outline

This paper highlights the limitations of large-scale language models (LLMs) in graph-related tasks and proposes a novel approach, G1, to improve this. G1 significantly improves the graph inference capabilities of LLMs by applying reinforcement learning (RL) to synthetic graph-theoretic tasks. To achieve this, we constructed Erdős, a large-scale graph inference dataset consisting of 50 graph-theoretic tasks of varying difficulty, 100,000 training data sets, and 5,000 test data sets. G1 demonstrates that a 3B model trained via RL outperforms the Qwen2.5-72B-Instruct model and exhibits superior zero-shot generalization to new tasks, domains, and graph encoding schemes. This study suggests that fine-tuning LLMs using RL on synthetic data is an efficient, scalable, and robust approach for building graph inference models. The source code and dataset are publicly available.

Takeaways, Limitations

Takeaways:
We demonstrate that fine-tuning LLM based on reinforcement learning using synthetic data can effectively improve graph inference capabilities.
By demonstrating that limited-size models outperform large-scale models, we present an efficient model training method.
We confirmed that the RL-trained model has excellent zero-shot generalization ability.
It shows performance improvements in various graph-related tasks (node classification, link prediction, etc.).
Ensure reproducibility and scalability of research by making the developed model and dataset public.
Limitations:
The Erdős dataset is based on synthetic data and may not perfectly reflect the diversity of real-world data.
The computational cost of RL-based learning can be relatively high.
Due to the nature of synthetic data, there may be limitations in generalization performance to real-world problems.
👍