Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning

Created by
  • Haebom

Author

Bowen Jin, Hansi Zeng, Zhenrui Yue, Jinsung Yoon, Sercan Arik, Dong Wang, Hamed Zamani, Jiawei Han

Outline

This paper presents the Search-R1 framework, which leverages reinforcement learning (RL) to enable a large-scale language model (LLM) to generate search queries and use the search results for inference during the inference process through real-time retrieval. Search-R1 optimizes the LLM inference path through multi-round retrieval interactions and employs a search result token masking technique and a simple result-based reward function for stable RL learning. Experimental results on seven question-answering datasets demonstrate that Search-R1 outperforms the existing RAG technique by 41% on the Qwen2.5-7B model and 20% on the Qwen2.5-3B model. Furthermore, we provide experimental insights into RL optimization methods, LLM selection, and the dynamics of search result length. The code and model checkpoints are publicly available.

Takeaways, Limitations

Takeaways:
A novel methodology is presented to enhance the search capability of LLM by utilizing reinforcement learning.
We demonstrate robust RL learning and performance improvements through multi-retrieval interactions and token masking.
Generalizability is verified through experimental results on various LLMs and datasets.
Support for reproducibility and follow-up research through open code and model disclosure.
Limitations:
Experimental results are limited to a specific LLM and dataset. Additional experiments on a wider range of LLMs and datasets are needed.
Potential performance degradation due to the simplicity of the outcome-based reward function. A more sophisticated reward function design is needed.
Dependence on search engine characteristics. Application and performance comparison across various search engines are necessary.
👍