Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

ReviewRL: Towards Automated Scientific Review with RL

Created by
  • Haebom

Author

Sihang Zeng, Kai Tian, Kaiyan Zhang, Yuru wang, Junqi Gao, Runze Liu, Sa Yang, Jingxuan Li, Xinwei Long, Jiaheng Ma, Biqing Qi, Bowen Zhou

Outline

This paper proposes ReviewRL, a reinforcement learning-based automated paper review system, to improve the peer review process, which is struggling due to the increasing volume of paper submissions and reviewer fatigue. ReviewRL combines an ArXiv-MCP search-based context generation pipeline that integrates relevant scientific literature, supervised learning fine-tuning to establish baseline reviewing skills, and a reinforcement learning procedure that uses a compound reward function to improve review quality and rating accuracy. Experimental results on ICLR 2025 papers demonstrate that ReviewRL outperforms existing methods in both rule-based metrics and model-based quality assessment. This publication will be made publicly available on GitHub.

Takeaways, Limitations

Takeaways:
We present the potential of an automated paper review system using reinforcement learning.
It improves the limitations of existing automated review systems, such as factual accuracy, evaluation consistency, and analysis depth.
Improve the quality of your review by leveraging relevant scientific literature.
It outperforms existing methods in rule-based metrics and model-based quality assessment.
Limitations:
Only experimental results for the ICLR 2025 paper are presented, so further verification of generalizability is needed.
There is a lack of detailed explanation of the design and optimization of compound reward functions.
After the release on GitHub, we need to evaluate its actual usability and effectiveness.
Further research is needed to determine whether it can fully replace the expertise and insight of human judges.
👍