This paper studies the feasibility of applying the policy gradient method, which is widely used in single-agent reinforcement learning, to two-player zero-sum incomplete information scalable games (EFGs). While existing scalable game methods rely on approximating semi-realistic values, the policy gradient method is incompatible with them. In this paper, we present for the first time the results that guarantee optimal iterative convergence from self-play to a regulated Nash equilibrium using the policy gradient method. This suggests that the policy gradient method has theoretically guaranteed convergence in scalable games, and can efficiently utilize probabilistic trajectory feedback and avoid importance sampling corrections.