This is a page that curates AI-related papers published worldwide. All content here is summarized using Google Gemini and operated on a non-profit basis. Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.
This paper presents the Search-R1 framework, which leverages reinforcement learning (RL) to enable a large-scale language model (LLM) to generate search queries and use the search results for inference during the inference process through real-time retrieval. Search-R1 optimizes the LLM inference path through multi-round retrieval interactions and employs a search result token masking technique and a simple result-based reward function for stable RL learning. Experimental results on seven question-answering datasets demonstrate that Search-R1 outperforms the existing RAG technique by 41% on the Qwen2.5-7B model and 20% on the Qwen2.5-3B model. Furthermore, we provide experimental insights into RL optimization methods, LLM selection, and the dynamics of search result length. The code and model checkpoints are publicly available.
Takeaways, Limitations
•
Takeaways:
◦
A novel methodology is presented to enhance the search capability of LLM by utilizing reinforcement learning.
◦
We demonstrate robust RL learning and performance improvements through multi-retrieval interactions and token masking.
◦
Generalizability is verified through experimental results on various LLMs and datasets.
◦
Support for reproducibility and follow-up research through open code and model disclosure.
•
Limitations:
◦
Experimental results are limited to a specific LLM and dataset. Additional experiments on a wider range of LLMs and datasets are needed.
◦
Potential performance degradation due to the simplicity of the outcome-based reward function. A more sophisticated reward function design is needed.
◦
Dependence on search engine characteristics. Application and performance comparison across various search engines are necessary.