Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

AReaL: A Large-Scale Asynchronous Reinforcement Learning System for Language Reasoning

Created by
  • Haebom

Author

Wei Fu, Jiaxuan Gao, Xujie Shen, Chen Zhu, Zhiyu Mei, Chuyi He, Shusheng Xu, Guo Wei, Jun Mei, Jiashu Wang, Tongkai Yang, Binhang Yuan, Yi Wu

Outline

This paper proposes AReaL, an asynchronous RL system, to improve the efficiency of reinforcement learning (RL) systems for inference tasks on large-scale language models (LLMs). Conventional synchronous systems suffer from low GPU utilization because they must wait for the longest output in a batch to complete. AReaL completely decouples generation and training. The generation worker continuously generates new outputs, while the training worker updates the model as data batches are collected. Through multiple system optimization techniques, data staleness control, and a staleness-aware PPO variant, we ensure the stability of RL training and significantly increase GPU utilization. Experimental results on mathematical and code inference benchmarks demonstrate up to a 2.77x learning speedup compared to synchronous systems.

Takeaways, Limitations

Takeaways:
We present AReaL, an efficient asynchronous system for reinforcement learning of large-scale language models.
Significantly improves GPU utilization compared to synchronous systems, improving learning speed by up to 2.77x.
Ensuring the stability of RL learning through PPO transformation and workload balancing considering data lag.
Demonstrated performance improvements in real-world math and code inference tasks.
Limitations:
AReaL's performance improvements may be limited to specific benchmarks and hardware environments.
Further research is needed on generalizability to other RL algorithms or LLM architectures.
Further research may be needed on optimizing strategies for managing data obsolescence.
👍