[공지사항]을 빙자한 안부와 근황 
Show more

Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning

Created by
  • Haebom

Author

Bowen Jin, Hansi Zeng, Zhenrui Yue, Jinsung Yoon, Sercan Arik, Dong Wang, Hamed Zamani, Jiawei Han

Outline

This paper proposes a method for acquiring external knowledge and up-to-date information to improve the efficiency of inference and text generation of large-scale language models (LLMs). To overcome the limitations of the existing search engine, we introduce the Search-R1 framework based on reinforcement learning (RL). In Search-R1, the LLM autonomously generates multiple search queries during the step-by-step inference process and optimizes the inference process by utilizing the search results. It performs stable RL learning by using the token masking technique and a simple result-based reward function. Experimental results on seven question-answering datasets show that Search-R1 improves the performance by 41% in the Qwen2.5-7B model and 20% in the Qwen2.5-3B model compared to the existing RAG technique. In addition, we present experimental analysis results on the RL optimization method, LLM selection, and search result length dynamics. The code and model checkpoints are open to the public on GitHub.

Takeaways, Limitations

Takeaways:
We demonstrate that reinforcement learning can enable LLM to interact efficiently with search engines and improve inference performance.
A robust RL learning method using token masking and a simple reward function is presented.
We demonstrate the superiority of Search-R1 through experimental results on various LLMs and datasets.
Provides insight into search results length dynamics, etc.
Ensuring reproducibility and suggesting possibilities for further research through disclosure of code and model checkpoints.
Limitations:
Only experimental results for specific LLMs and datasets are presented, requiring further research on generalizability.
Lack of performance evaluation across complex questions or multiple knowledge domains.
Further research may be needed on the design of the reward function.
There is an aspect that depends on the performance of the search engine.
👍