Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

AI-SearchPlanner: Modular Agentic Search via Pareto-Optimal Multi-Objective Reinforcement Learning

Created by
  • Haebom

Author

Lang Mei, Zhihan Yang, Chong Chen

Outline

This paper explores the integration of a large-scale language model (LLM) with a search engine, leveraging the LLM's internal pre-trained knowledge and external information. Specifically, we propose a method for enhancing LLM inference through multiple rounds of interaction with the search engine using reinforcement learning (RL). Existing RL-based search agents rely on a single LLM to handle both search planning and question answering (QA) tasks, limiting their ability to simultaneously optimize both functions. Considering the practicality of sophisticated AI search systems that utilize large, fixed LLMs to ensure high-quality QA, we propose AI-SearchPlanner , a novel reinforcement learning framework that utilizes a small, trainable LLM dedicated to search planning . AI-SearchPlanner improves the performance of fixed QA models through three key innovations: architectural separation of the search planner and generator, dual reward sorting for search planning, and Pareto optimization of plan utility and cost. Extensive experiments on real-world datasets demonstrate that AI-SearchPlanner outperforms existing RL-based search agents in both efficiency and effectiveness, and exhibits strong generalization across a variety of fixed QA models and data domains.

Takeaways, Limitations

Takeaways:
We present a novel RL-based search framework (AI-SearchPlanner) that simultaneously improves efficiency and effectiveness by leveraging fixed, high-performance QA models.
Improve performance by separating search planning and question answering, using models optimized for each task.
Balance the quality and efficiency of search plans through double-compensatory sorting and Pareto optimization.
It shows excellent generalization performance across various fixed QA models and data domains.
Limitations:
The performance of the proposed method may depend on the quality of the fixed QA model used.
The scope of the experimental dataset may be limited, and verification of generalization performance on other datasets is required.
As AI-SearchPlanner's complexity increases, its computational costs may increase.
Additional performance evaluations are needed for long-term search planning and complex queries.
👍