This paper explores the integration of a large-scale language model (LLM) with a search engine, leveraging the LLM's internal pre-trained knowledge and external information. Specifically, we propose a method for enhancing LLM inference through multiple rounds of interaction with the search engine using reinforcement learning (RL). Existing RL-based search agents rely on a single LLM to handle both search planning and question answering (QA) tasks, limiting their ability to simultaneously optimize both functions. Considering the practicality of sophisticated AI search systems that utilize large, fixed LLMs to ensure high-quality QA, we propose AI-SearchPlanner , a novel reinforcement learning framework that utilizes a small, trainable LLM dedicated to search planning . AI-SearchPlanner improves the performance of fixed QA models through three key innovations: architectural separation of the search planner and generator, dual reward sorting for search planning, and Pareto optimization of plan utility and cost. Extensive experiments on real-world datasets demonstrate that AI-SearchPlanner outperforms existing RL-based search agents in both efficiency and effectiveness, and exhibits strong generalization across a variety of fixed QA models and data domains.