This paper studies reinforcement learning (RL) for privileged planning in autonomous driving. Existing approaches are rule-based but lack scalability. In contrast, RL offers high scalability and avoids the cumulative error problem of imitation learning. Existing RL approaches for autonomous driving use complex reward functions that aggregate multiple individual rewards, such as progress, position, and orientation. This paper demonstrates that PPO fails to optimize these reward functions as the mini-batch size increases, limiting its scalability. Therefore, this paper proposes a novel reward design that optimizes a single intuitive reward, path completion. Violations are punished by either terminating the episode or multiplicatively decreasing path completion. We demonstrate that PPO trained with the proposed simple reward scales well with larger mini-batch sizes and achieves improved performance. Training with large mini-batch sizes enables efficient scaling through distributed data parallelism. We scaled the training to 300 million samples in CARLA and 500 million samples in nuPlan on a single 8-GPU node. The resulting model achieved 64 DS on the CARLA longest6 v2 benchmark, significantly outperforming other RL methods using more complex rewards. With minimal modifications to the CARLA method, it also achieved the best learning-based approach on nuPlan. On the Val14 benchmark, it achieved 91.3 points for non-responsive traffic and 90.6 points for responsive traffic, achieving a 10x improvement over previous research.