This paper analyzes the limitations of Group Relative Policy Optimization (GRPO), a reinforcement learning method for strengthening large-scale inference models (LRMs), and proposes Discriminative Constrained Optimization (DisCO), a novel framework to improve upon it. Based on discriminative learning principles, DisCO aims to eliminate question-level difficulty bias, ensure training stability, and address data imbalance. Experimental results show that DisCO outperforms GRPO and Differentiable Actor Policy Optimization (DAPO) in improving the mathematical inference ability of models based on Supervised Fine-tuning (SFT).