This paper aims to improve the inference performance of Large Language Models (LLMs) with dynamic generation lengths in Reinforcement Learning with Verifiable Rewards (RLVR) environments. To address the high gradient variance problem, we propose Variance-reduced Length-dependent Normalization (VL Norm). VL Norm is designed to find unbiased estimates with minimal variance, theoretically providing unbiased estimates and minimizing gradient variance. With its simple implementation, it overcomes the limitations of existing methods and demonstrates excellent performance in various experiments. In particular, when integrated into the DAPO algorithm, it achieves up to 2.67 times faster convergence in the CountDown task.