DeepSeek-R1 improved the inference ability of LLM through a rule-based reward system, but such discrete reward functions can lead to gradient anomalies, unstable optimization, and slow convergence. ReDit solves this problem by adding simple random noise to the discrete reward signal. This perturbed reward provides continuous exploratory gradients throughout the training process, which allows for smoother gradient updates and faster convergence. The injected noise also introduces stochastic elements into the flat reward region, encouraging the model to explore new policies and deviate from local optima. Experiments on various tasks demonstrate the effectiveness and efficiency of ReDit. On average, ReDit achieves similar performance to traditional GRPO while using only about 10% of the training steps, and achieves 4% better performance than traditional GRPO when trained for a similar period of time. Visualizations confirm that ReDit significantly alleviates the gradient problem. In addition, theoretical analysis is provided to further verify these advantages.