We address the cost of collecting high-quality training examples for fine-tuning a Gradual Reasoning Pretraining and Optimization (GRPO) model. To investigate the effect of example difficulty on GRPO training effectiveness, we compare easy, medium, difficult, and random selection strategies across multiple models and inference tasks. Training on the most difficult 10% of examples (the ones on which the baseline model most frequently fails) yields up to a 47% performance improvement, while training on easy examples yields minimal improvements of 3-15%. This is because difficult examples maintain mixed success/failure outcomes throughout training, whereas easy examples rapidly converge to consistent success, eliminating learning opportunities. Furthermore, models trained on difficult examples exhibit better external distributional generalization, and only models trained on difficult examples achieve meaningful improvements on the AIME2025 benchmark.