This paper focuses on the development of a single-agent-based autonomous Deep Research (DR) model. Unlike existing multi-agent systems, this paper presents an autonomous model in which a single agent dynamically determines its next action based on the situation, minimizing web crawling and Python tool integration. Instead of using existing pre-trained or directive-tuned LLMs, we propose a method to enhance agent capabilities through continuous reinforcement learning (RL) on an inference-optimized model. By applying a simple RL recipe using entirely synthetic data to various open-source LLMs, the best-performing model, the SFR-DR-20B, achieved a performance gain of up to 28.7% on the Humanity's Last Exam benchmark. We also present in-depth experimental analysis of the proposed methodology.