This is a page that curates AI-related papers published worldwide. All content here is summarized using Google Gemini and operated on a non-profit basis. Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.
Breaking the Exploration Bottleneck: Rubric-Scaffolded Reinforcement Learning for General LLM Reasoning
Created by
Haebom
Author
Yang Zhou, Sunzhu Li, Shunyu Liu, Wenkai Fang, Jiale Zhao, Jingwen Yang, Jianwei Lv, Kongcheng Zhang, Yihe Zhou, Hengtong Lu, Wei Chen, Yan Xie, Mingli Song
Outline
This paper presents a method for leveraging reinforcement learning (RL) to improve the inference capability of large-scale language models (LLMs). Existing RL-based LLM training relies on high-quality samples, but the inherent limitations of LLMs limit the exploration of these samples. To address this, this paper proposes a novel framework, Rubric-Scaffolded Reinforcement Learning (RuscaRL). RuscaRL utilizes checklist-style rubrics to induce diverse, high-quality responses during the rollout generation phase and provides reliable rewards based on the rubrics during the model training phase. As a result, RuscaRL outperforms existing methods on various benchmarks. In particular, it improves the performance of Qwen2.5-7B-Instruct from 23.6 to 50.3 on HealthBench-500, and Qwen3-30B-A3B-Instruct to 61.1, outperforming GPT-4.1 and OpenAI-o3.
Takeaways, Limitations
•
Takeaways:
◦
We demonstrate that the reasoning ability of LLM can be effectively improved through a reinforcement learning framework (RuscaRL) utilizing a checklist-style rubric.
◦
Achieved state-of-the-art performance on various benchmarks, particularly outperforming GPT-4.1 on HealthBench-500.
◦
Rubric-based exploration and reward strategies present an effective methodology for improving LLM's reasoning ability.
•
Limitations:
◦
Research is currently in progress, and the code, model, and dataset will be released at a later date.
◦
Performance can be significantly impacted by the quality of rubric design. Detailed descriptions and guidelines for rubric design are lacking.
◦
Lack of generalization performance evaluation for various types of inference problems.