Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

APRIL: Active Partial Rollouts in Reinforcement Learning to Tame Long-tail Generation

Created by
  • Haebom

Author

Yuzhen Zhou, Jiajun Li, Yusheng Su, Gowtham Ramesh, Zilin Zhu, Xiang Long, Chenyang Zhao, Jin Pan, Xiaodong Yu, Ze Wang, Kangrui Du, Jialian Wu, Ximeng Sun, Jiang Liu, Qiaolin Yu, Hao Chen, Zicheng Liu, Emad Barsoum

Outline

Reinforcement learning (RL) has played a crucial role in advancing large-scale pretrained language models (LLMs), and models such as the GPT-o series, DeepSeek-R1, Kimi-K1.5, Grok 4, and GLM-4.5 have improved their inference and coding capabilities through RL training. However, RL training remains computationally expensive, with rollout generation accounting for more than 90% of the overall runtime. The long-tailed distribution of rollout response lengths limits efficiency, with a few long responses delaying the entire batch, resulting in GPU idle time. To address this issue, the authors propose Active Partial Rollouts in Reinforcement Learning (APRIL), which mitigates long-tail inefficiency. APRIL overprovisions rollout requests during the rollout phase, terminates when the target number of responses is reached, and recycles incomplete responses for future use. Experimental results show that APRIL improves rollout throughput by an average of 22.5% (up to 44%) over commonly used RL algorithms (GRPO, DAPO, and GSPO), accelerates convergence, and achieves an average of 2.1% (up to 8%) higher final accuracy across tasks. APRIL is framework- and hardware-agnostic, already integrated into the slime RL framework, and can be deployed on both NVIDIA and AMD GPUs.

Takeaways, Limitations

Takeaways:
APRIL presents a novel technique to improve RL training efficiency.
It showed improved rollout throughput and final accuracy, as well as improved convergence speed.
Applicable to various RL algorithms and hardware environments.
Limitations:
There is no specific mention of Limitations in the paper.
Experimental results may be limited to specific RL algorithms and tasks.
APRIL's effectiveness may vary depending on model size and task difficulty.
👍