This paper focuses on the characteristics of bootstrapping (generating new value predictions using previous value predictions) in temporal-difference learning (TD), and most TD control methods use bootstrapping from a single action-value function (e.g., Q-learning, Sarsa). In contrast, methods that use two asymmetric value functions (e.g., QV-learning or AV-learning) to learn action values using state values as intermediate steps have received relatively little attention. This paper analyzes these algorithm families in terms of convergence and sampling efficiency, revealing that while both families are more efficient than Expected Sarsa in the prediction setting, only AV-learning offers a significant advantage over Q-learning in the control setting. Finally, we present Regularized Dueling Q-learning (RDQ), a novel AV-learning algorithm that significantly outperforms Dueling DQN on the MinAtar benchmark.