Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

Search Wisely: Mitigating Sub-optimal Agentic Searches By Reducing Uncertainty

Created by
  • Haebom

Author

Peilin Wu, Mian Zhang, Xinlu Zhang, Xinya Du, Zhiyu Zoey Chen

$\Beta$-GRPO Proposal for Analyzing Retrieval Behavior and Improving the Efficiency of Agentic Retrieval-Augmented Generation (RAG) Systems

Outline

Agentic RAG systems enhance LLM through dynamic, multi-stage reasoning and information retrieval, but they can exhibit inefficient retrieval behaviors, such as oversearching (retrieving redundant information) and undersearching (failure to retrieve necessary information). In this study, we define and quantify these behaviors and reveal their prevalence across multiple QA datasets and agentic RAG systems. Furthermore, we identify a significant link between these inefficiencies and uncertainty about the model's knowledge boundaries, revealing that response accuracy is correlated with the model's uncertainty about its retrieval decisions. To address this, we propose $\beta$-GRPO, a reinforcement learning-based training method that incorporates a confidence threshold that rewards high-certainty retrieval decisions. Experiments on seven QA benchmarks show that $\beta$-GRPO enhances the agentic RAG capabilities of the 3B model, outperforming other strong baselines, achieving a 4% higher average accuracy agreement score.

Takeaways, Limitations

We quantitatively analyzed the search inefficiencies (oversearch and undersearch) of the Agentic RAG system.
We uncovered a link between search inefficiency and uncertainty about the model's knowledge boundaries.
We propose $\beta$-GRPO based on reinforcement learning to improve search efficiency and performance.
The 3B model achieved excellent performance on seven QA benchmarks.
The specific Limitations was not presented in the paper.
👍