Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

SimpleVLA-RL: Scaling VLA Training via Reinforcement Learning

Created by
  • Haebom

Author

Haozhan Li, Yuxin Zuo, Jiale Yu, Yuhao Zhang, Zhaohui Yang, Kaiyan Zhang, Xuekai Zhu, Yuchen Zhang, Tianxing Chen, Ganqu Cui, Dehui Wang, Dingxiang Luo, Yuchen Fan, Youbang Sun, Jia Zeng, Jiangmiao Pang, Shanghang Zhang, Yu Wang, Yao Mu, Bowen Zhou, Ning Ding

Outline

This paper proposes the SimpleVLA-RL framework, which enhances the long-term step-by-step action planning of Vision-Language-Action (VLA) models through reinforcement learning (RL). To address the reliance of existing VLA models on large-scale supervised fine-tuning (SFT) and their difficulty in generalizing to distribution shifts, we introduce VLA-specific trajectory sampling, scalable parallelization, multi-environment rendering, and optimized loss computation based on veRL. SimpleVLA-RL, applied to OpenVLA-OFT, achieves state-of-the-art performance on LIBERO and outperforms $\pi_0$ on RoboTwin 1.0 & 2.0 through an exploratory boosting strategy. Furthermore, we identify a novel phenomenon called "pushcut," which discovers unseen patterns during RL training. This phenomenon enables large-scale data dependency reduction, robust generalization, and outperforming SFT on real-world tasks.

Takeaways, Limitations

Takeaways:
We demonstrate that the long-term planning ability of the VLA model can be effectively improved through reinforcement learning.
Reduces dependence on large-scale SFT data and achieves robust generalization performance against distribution shifts.
It outperforms SFT-based models in real-world robotic tasks.
We further expand the potential of the VLA model by discovering a new behavioral pattern, the "pushcut" phenomenon, during RL training.
Limitations:
The performance improvement of SimpleVLA-RL may be limited to certain environments (LIBERO, RoboTwin).
Further research is needed into the generality and causes of the "pushcut" phenomenon.
Additional generalization performance evaluations are needed for various robot platforms and tasks.
👍