Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

AI-SearchPlanner: Modular Agentic Search via Pareto-Optimal Multi-Objective Reinforcement Learning

Created by
  • Haebom

Author

Lang Mei, Zhihan Yang, Chong Chen

Outline

This paper discusses research integrating large-scale language models (LLMs) with search engines, leveraging the LLM's internal pre-trained knowledge and external information. Specifically, reinforcement learning (RL) is presented as a promising paradigm for improving LLM inference through multi-turn interactions with the search engine. Existing RL-based search agents rely on a single LLM to handle both search planning and question answering (QA) tasks, which limits their ability to simultaneously optimize both functions. Considering sophisticated AI search systems that utilize large, fixed LLMs (e.g., GPT-4, DeepSeek-R1) to ensure high-quality QA, this paper proposes a more effective and efficient approach that leverages a small, trainable LLM dedicated to search planning. We present a novel reinforcement learning framework, AI-SearchPlanner, designed to improve the performance of fixed QA models by focusing on search planning. This goal is achieved through three key innovations: (1) separation of the search planner and generator architectures, (2) dual reward sorting for search planning, and (3) Pareto optimization of plan utility and cost. Through extensive experiments on real-world datasets, we demonstrate that AI-SearchPlanner outperforms existing RL-based search agents in both effectiveness and efficiency, and exhibits strong generalization capabilities across a variety of fixed QA models and data domains.

Takeaways, Limitations

Takeaways:
We demonstrate that the efficiency and effectiveness of RL-based search agents can be improved by focusing the search plan using a fixed, high-quality QA model.
Performance improvements were achieved through novel techniques such as separation of search planner and generator architecture, double-compensatory sorting, and Pareto optimization.
It exhibits strong generalization performance across various fixed QA models and data domains.
Limitations:
The performance of the proposed framework may depend on the quality of the fixed QA model used.
The experiments are limited to a specific dataset, and further validation of generalization performance on other datasets is needed.
Further research may be needed on parameter settings for Pareto optimization.
👍