Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

IRL-VLA: Training an Vision-Language-Action Policy via Reward World Model

Created by
  • Haebom

Author

Anqing Jiang, Yu Gao, Yiru Wang, Zhigang Sun, Shuo Wang, Yuwen Heng, Hao Sun, Shichen Tang, Lijuan Zhu, Jinhao Chai, Jijun Wang, Zichong Gu, Hao Jiang, Li Sun

Outline

This paper proposes IRL-VLA to address two key challenges for Vision-Language-Action (VLA) models for autonomous driving: the suboptimal and limited performance of existing imitation learning-based VLA architectures in open-loop settings, and the difficulty of closed-loop learning, which relies heavily on high-fidelity sensor simulations. IRL-VLA is a closed-loop reinforcement learning framework that utilizes a lightweight reward world model based on inverse reinforcement learning (IRL) and a self-constructed VLA approach. The approach consists of three steps: first, proposing a VLA architecture and pretraining it via imitation learning; second, constructing a lightweight reward world model via IRL for efficient closed-loop reward computation; and finally, designing a reward world model-guided reinforcement learning that balances safety, comfort, and traffic efficiency using Proximal Policy Optimization (PPO). IRL-VLA achieved state-of-the-art performance on the NAVSIM v2 end-to-end driving benchmark and ranked first in the CVPR2025 Autonomous Driving Grand Challenge.

Takeaways, Limitations

Takeaways:
We improved the efficiency of closed-loop reinforcement learning by utilizing a lightweight reward world model based on inverse reinforcement learning.
Autonomous driving performance has been improved through a multifaceted compensation design that takes into account safety, comfort, and traffic efficiency.
It demonstrated excellent performance in the NAVSIM v2 benchmark and CVPR2025 Autonomous Driving Grand Challenge.
Provides a framework to accelerate VLA research in closed-loop autonomous driving.
Limitations:
Further research is needed on the generalization performance of the proposed method.
Performance verification in a real environment is required.
Further analysis of the accuracy and reliability of the compensation world model is needed.
👍