This paper addresses the problem of optimistic search, a key issue in improving sample efficiency in human-feedback-based reinforcement learning (RLHF). We analyze why existing search bonus methods fail to achieve optimism through KL or α-divergence regularization. We point out that such regularization biases exploration toward high-probability regions of the reference model, thereby reinforcing conservative behavior. To address this, we propose the Generalized Search Bonus (GEB), a novel theoretical framework that satisfies the optimism principle. GEB counteracts the divergence bias through reference-dependent reward adjustments, incorporates existing heuristic bonuses as a special case, and naturally extends across the entire α-divergence family. Experimental results demonstrate that GEB consistently outperforms baselines across various divergence settings and on alignment tasks across large-scale language model backbones, demonstrating that GEB is a fundamental and practical solution for optimistic search in RLHF.