This paper addresses the challenge of securing high-quality training data for fine-tuning language models. Specifically, we experimentally study how to prioritize data of different difficulty levels (easy, medium, difficult, and random) under budget constraints using Group Relative Policy Optimization (GRPO) fine-tuning across a variety of model sizes and types. Using difficulty estimates obtained from multi-sample evaluations of the base model, we compare and analyze four subset selection policies selected from the same unlabeled data pool. Experimental results show that training with the most difficult examples yields up to 47% performance gains, while easy examples yield the least performance gains. This is likely due to the fact that difficult examples provide more learning opportunities during GRPO training. In conclusion, we provide practical guidance that prioritizing difficult examples in budget-constrained inference tasks using GRPO can significantly improve performance.