Revisiting Regularized Policy Optimization for Stable and Efficient Reinforcement Learning in Two-Player Games