Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

IRL-VLA: Training an Vision-Language-Action Policy via Reward World Model

Created by
  • Haebom

Author

Anqing Jiang, Yu Gao, Yiru Wang, Zhigang Sun, Shuo Wang, Yuwen Heng, Hao Sun, Shichen Tang, Lijuan Zhu, Jinhao Chai, Jijun Wang, Zichong Gu, Hao Jiang, Li Sun

Outline

This paper presents IRL-VLA to address two key challenges of Vision-Language-Action (VLA) models for autonomous driving: the limited performance of existing imitation learning-based VLA architectures in open-loop settings and the challenges of closed-loop learning, which heavily relies on high-fidelity sensor simulations. IRL-VLA is a closed-loop reinforcement learning framework that combines a lightweight reward world model based on inverse reinforcement learning (IRL) with a self-constructed VLA approach. Comprised of three steps, the framework first pretrains VLA policies via imitation learning. In the second step, it builds a lightweight reward world model via IRL, enabling efficient closed-loop reward computation. Finally, it designs a specialized reward world model-guided reinforcement learning using Proximal Policy Optimization (PPO) to effectively balance safety, comfort, and traffic efficiency. It achieves state-of-the-art performance on the NAVSIM v2 end-to-end driving benchmark and ranked first in the CVPR2025 Autonomous Driving Grand Challenge.

Takeaways, Limitations

Takeaways:
We present a novel VLA framework (IRL-VLA) that enhances the efficiency of closed-loop reinforcement learning by utilizing a lightweight reward world model based on inverse reinforcement learning.
Improving autonomous driving performance by combining imitation learning, inverse reinforcement learning, and PPO-based reinforcement learning to balance safety, comfort, and efficiency.
Excellent performance verified in the NAVSIM v2 benchmark and CVPR2025 Autonomous Driving Grand Challenge.
Contributing to the advancement of VLA research in the field of closed-loop autonomous driving.
Limitations:
Further evaluation of the generalization performance of the proposed IRL-VLA framework is needed.
Additional research is needed to verify performance and ensure safety in real-world environments.
Lack of detailed description of the design and learning process of the lightweight reward world model.
Further research is needed on adaptability to various environments and situations.
👍