Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

A Survey of Reinforcement Learning for Large Reasoning Models

Created by
  • Haebom

Author

Kaiyan Zhang, Yuxin Zuo, Bingxiang He, Youbang Sun, Runze Liu, Che Jiang, Yuchen Fan, Kai Tian, Guoli Jia, Pengfei Li, Yu Fu, Xingtai Lv, Yuchen Zhang, Sihang Zeng, Shang Qu, Haozhan Li, Shijie Wang, Yuru Wang, Liu, Zonglin Li, Huayu Chen, Xiaoye Qu, Yafu Li, Weize Chen, Zhenzhao Yuan, Junqi Gao, Dong Li, Zhiyuan Ma, Ganqu Cui, Zhiyuan Liu, Biqing Qi, Ning Ding, Bowen Zhou

Outline

This paper examines recent advances in reinforcement learning (RL) for inference using large-scale language models (LLMs). We highlight RL's contributions to solving complex logical tasks, such as mathematics and coding, and highlight its importance in transforming LLMs into LRMs (LRMs). We also discuss key challenges in scaling RL-based LRMs, particularly in terms of computational resources, algorithm design, training data, and infrastructure, and suggest future research directions. We analyze research that has applied RL to LLMs and LRMs since the release of DeepSeek-R1 to enhance inference capabilities, exploring advancements and future opportunities in this field.

Takeaways, Limitations

Takeaways:
RL contributes to improving LLM's reasoning ability, and is particularly effective for complex tasks such as mathematics and coding.
RL has emerged as a key methodology for transforming LLM into LRM.
This paper reassesses the progress in this field and suggests future research directions.
Provides insights by analyzing research conducted since the launch of DeepSeek-R1.
Limitations:
RL-based LRM extensions face challenges in terms of computational resources, algorithm design, training data, and infrastructure.
The paper lacks direct reference to specific methodological Limitations or experimental Limitations.
There may be a lack of specific strategies for expanding to ASI (Artificial SuperIntelligence).
👍