This paper proposes Dynamic Clipping Policy Optimization (DCPO), a novel framework for improving the inference capability of large-scale language models through reinforcement learning. To address the zero-gradient problem of the existing GRPO method, we introduce a dynamic clipping strategy based on token-specific prior probabilities and a smooth advantage normalization technique across the cumulative training phase. DCPO achieves state-of-the-art performance on four benchmarks based on four different models, outperforming existing methods GRPO, DAPO, and GSPO, particularly on the AIME24 and AIME25 benchmarks. Furthermore, it improves the non-zero gradient ratio by an average of 28% compared to GRPO, doubles the training efficiency compared to DAPO, and significantly reduces the token clipping rate.