This paper discusses research integrating large-scale language models (LLMs) with search engines, leveraging the LLM's internal pre-trained knowledge and external information. Specifically, reinforcement learning (RL) is presented as a promising paradigm for improving LLM inference through multi-turn interactions with the search engine. Existing RL-based search agents rely on a single LLM to handle both search planning and question answering (QA) tasks, which limits their ability to simultaneously optimize both functions. Considering sophisticated AI search systems that utilize large, fixed LLMs (e.g., GPT-4, DeepSeek-R1) to ensure high-quality QA, this paper proposes a more effective and efficient approach that leverages a small, trainable LLM dedicated to search planning. We present a novel reinforcement learning framework, AI-SearchPlanner, designed to improve the performance of fixed QA models by focusing on search planning. This goal is achieved through three key innovations: (1) separation of the search planner and generator architectures, (2) dual reward sorting for search planning, and (3) Pareto optimization of plan utility and cost. Through extensive experiments on real-world datasets, we demonstrate that AI-SearchPlanner outperforms existing RL-based search agents in both effectiveness and efficiency, and exhibits strong generalization capabilities across a variety of fixed QA models and data domains.