This paper studies reinforcement learning (RL) for a class of continuous-time linear-quadratic (LQ) control problems where the state is scalar and there is no execution control reward, but the variability of the state process depends on both the state and control variables. We devise an RL algorithm that directly learns the optimal policy parameters by applying a model-free approach that does not rely on model parameter knowledge or estimation. The main contributions are the introduction of an exploration schedule and the regret analysis of the proposed algorithm. We provide a convergence speed to the optimal values of the policy parameters, and prove that the algorithm achieves a regret bound of $O(N^{\frac{3}{4}})$ up to a logarithmic factor. We verify the theoretical results through simulation studies and demonstrate the effectiveness and reliability of the proposed algorithm. We also perform a numerical comparison with recent model-based probabilistic LQ RL studies applied in the state- and control-dependent variability setting, showing that the former outperforms the latter in terms of the regret bound.