This paper studies reinforcement learning (RL) for a class of problems similar to the continuous-time stochastic linear-quadratic (LQ) control problems discussed in Huang et al. (2024). The state is a scalar-valued problem, and the variability depends on both the state and the control in the absence of an execution control reward. In this paper, we propose a model-free, data-driven search mechanism that adaptively adjusts the entropy regulation by the critic and the policy divergence by the agent. Unlike the fixed or deterministic search schedules used in previous studies (Huang et al., 2024), the proposed adaptive search approach improves the learning efficiency with minimal adjustments. Despite its flexibility, our method achieves a quasi-linear regret bound that matches the best model-free results for this class of LQ problems, which were previously derived only with fixed search schedules. Numerical experiments show that the adaptive search accelerates convergence and improves the regret performance compared to non-adaptive model-based and model-based methods.