Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

RAG-R1: Incentivize the Search and Reasoning Capabilities of LLMs through Multi-query Parallelism

Created by
  • Haebom

Author

Zhiwen Tan, Jiaming Huang, Qintong Wu, Hongxuan Zhang, Chenyi Zhuang, Jinjie Gu

Outline

In this paper, we present a Retrieval Augmented Generation (RAG) method that enhances the retrieval and inference capabilities of a model through reinforcement learning (RL) to address the limitations of large-scale language models (LLMs) that tend to generate hallucinatory or outdated responses due to static internal knowledge. To address the training stability issues, significant inference time, and limited functionality due to single-query mode of existing RAG methods, we propose a novel training framework, called RAG-R1. RAG-R1 is designed to enable LLMs to adaptively utilize internal and external knowledge during the inference process, and extends the generation and retrieval process from single-query mode to multi-query parallel processing, thereby reducing the inference time and enhancing the functionality of the model. Extensive experiments on seven question-answering benchmarks demonstrate that the proposed method outperforms the best-performing baseline models by up to 13.2%, while reducing the inference time by 11.1%.

Takeaways, Limitations

Takeaways:
Suggesting the possibility of reducing inference time and improving performance of RAG-based LLM
Presenting an efficient knowledge utilization method through multi-query parallel processing
Experimentally verified performance improvement and reduction in inference time compared to the existing best-performing model in seven question-answering benchmarks
Limitations:
Further research is needed on the generalization performance of the proposed method.
Robustness evaluation for different types of questions is needed
Need to review the generalizability of experimental results limited to specific benchmarks
👍