Collecting high-quality training examples for fine-tuning language models is expensive, and budget constraints limit the amount of data that can be acquired. In this study, we compared selection strategies (easy, medium, hard, and random) across multiple models and inference tasks to investigate whether example difficulty affects GRPO training efficiency. Training with the most difficult 10% of examples (the ones on which the base model most frequently fails) resulted in significant performance improvements of up to 47%, while easy examples showed minimal improvements of 3-15%. This is because GRPO requires output distribution to generate training signals. Difficult examples maintain mixed success/failure outcomes throughout training, whereas easy examples quickly converge to consistent success, eliminating learning opportunities. Furthermore, models trained with difficult examples exhibited better out-of-distribution generalization, and only models trained with difficult examples achieved significant gains on the AIME2025 benchmark.