This is a page that curates AI-related papers published worldwide. All content here is summarized using Google Gemini and operated on a non-profit basis. Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.
OpenRLHF: An Easy-to-use, Scalable and High-performance RLHF Framework
Created by
Haebom
Author
Jian Hu, Xibin Wu, Wei Shen, Jason Klein Liu, Zilin Zhu, Weixun Wang, Songlin Jiang, Haoran Wang, Hao Chen, Bin Chen, Weikai Fang, Xianyu, Yu Cao, Haotian Xu, Yiming Liu
Outline
This paper demonstrates that fine-tuned large-scale language models (LLMs) via reinforcement learning from human feedback (RLHF) and reinforcement learning with verifiable rewards (RLVR) can significantly improve human-AI value alignment and raise the upper bound on AI capabilities, especially in inference-intensive long-context chain of thought (long-CoT) tasks. However, existing RLHF (or RLVR) frameworks face challenges such as inference bottlenecks and complexity barriers, which limit their accessibility. To address these challenges, the researchers present OpenRLHF, a user-friendly, scalable, and trainable open-source RLHF framework built on Ray, vLLM, DeepSpeed, and HuggingFace Transformers. OpenRLHF is designed to lower the barrier to entry for researchers and practitioners with a simplified design, clear code structure, and comprehensive documentation. Experimental results show that OpenRLHF achieves speedups ranging from 1.22x to 1.68x across a range of model sizes compared to state-of-the-art frameworks, while significantly reducing the number of lines of code required for implementation. OpenRLHF is publicly available under __T83102_____ and has already been adopted by major institutions to accelerate RLHF research and learning.