This is a page that curates AI-related papers published worldwide. All content here is summarized using Google Gemini and operated on a non-profit basis. Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.
This paper proposes IRL-VLA to address two key challenges for Vision-Language-Action (VLA) models for autonomous driving: the suboptimal and limited performance of existing imitation learning-based VLA architectures in open-loop settings, and the difficulty of closed-loop learning, which relies heavily on high-fidelity sensor simulations. IRL-VLA is a closed-loop reinforcement learning framework that utilizes a lightweight reward world model based on inverse reinforcement learning (IRL) and a self-constructed VLA approach. The approach consists of three steps: first, proposing a VLA architecture and pretraining it via imitation learning; second, constructing a lightweight reward world model via IRL for efficient closed-loop reward computation; and finally, designing a reward world model-guided reinforcement learning that balances safety, comfort, and traffic efficiency using Proximal Policy Optimization (PPO). IRL-VLA achieved state-of-the-art performance on the NAVSIM v2 end-to-end driving benchmark and ranked first in the CVPR2025 Autonomous Driving Grand Challenge.
Takeaways, Limitations
•
Takeaways:
◦
We improved the efficiency of closed-loop reinforcement learning by utilizing a lightweight reward world model based on inverse reinforcement learning.
◦
Autonomous driving performance has been improved through a multifaceted compensation design that takes into account safety, comfort, and traffic efficiency.
◦
It demonstrated excellent performance in the NAVSIM v2 benchmark and CVPR2025 Autonomous Driving Grand Challenge.
◦
Provides a framework to accelerate VLA research in closed-loop autonomous driving.
•
Limitations:
◦
Further research is needed on the generalization performance of the proposed method.
◦
Performance verification in a real environment is required.
◦
Further analysis of the accuracy and reliability of the compensation world model is needed.