This paper analyzes depth and breadth, two key factors for improving the inference performance of language models in reinforcement learning-based verifiable reward learning (RLVR). We point out that the existing GRPO algorithm, with its Limitations, overweights samples with medium accuracy and underweights low-accuracy samples, which are crucial for improving inference performance. To address this, we propose Difficulty Adaptive Rollout Sampling (DARS), a technique that rebalances weights through multi-stage rollouts on difficult problems. Furthermore, we present a method to expand the breadth of the training data by significantly increasing the batch size and using full-batch updates across multiple epochs instead of the mini-batch iterations of PPO. Finally, we propose DARS-B, which combines DARS with large batch sizes, and experimentally demonstrate that depth and breadth independently contribute to improving inference performance in RLVR.