Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

Hard Examples Are All You Need: Maximizing GRPO Post-Training Under Annotation Budgets

Created by
  • Haebom

Author

Benjamin Pikus, Pratyush Ranjan Tiwari, Burton Ye

Outline

We address the cost of collecting high-quality training examples for fine-tuning a Gradual Reasoning Pretraining and Optimization (GRPO) model. To investigate the effect of example difficulty on GRPO training effectiveness, we compare easy, medium, difficult, and random selection strategies across multiple models and inference tasks. Training on the most difficult 10% of examples (the ones on which the baseline model most frequently fails) yields up to a 47% performance improvement, while training on easy examples yields minimal improvements of 3-15%. This is because difficult examples maintain mixed success/failure outcomes throughout training, whereas easy examples rapidly converge to consistent success, eliminating learning opportunities. Furthermore, models trained on difficult examples exhibit better external distributional generalization, and only models trained on difficult examples achieve meaningful improvements on the AIME2025 benchmark.

Takeaways, Limitations

When budget constraints are in place, collecting and annotating examples that the base model struggles with should be a priority.
Hard examples provide almost all the learning value in GRPO fine-tuning.
Training with difficult examples improves external distribution generalization performance.
Because this paper focuses on a specific model and task, further research is needed to determine how generalizable the findings are to other models and tasks.
No specific methodology was specified for determining the difficulty of training examples.
👍