In this paper, we propose Deliberative Searcher, the first framework that integrates certainty correction and retrieval-based question answering to improve the reliability of large-scale language models (LLMs). The agent performs multi-stage reflection and verification on Wikipedia data, and is trained with a reinforcement learning algorithm that optimizes accuracy under soft confidence constraints. Experimental results show that the proposed method improves the alignment between model confidence and accuracy, resulting in more reliable output. This paper will be continuously updated.