This is a page that curates AI-related papers published worldwide. All content here is summarized using Google Gemini and operated on a non-profit basis. Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.
This paper presents IRL-VLA to address two key challenges of Vision-Language-Action (VLA) models for autonomous driving: the limited performance of existing imitation learning-based VLA architectures in open-loop settings and the challenges of closed-loop learning, which heavily relies on high-fidelity sensor simulations. IRL-VLA is a closed-loop reinforcement learning framework that combines a lightweight reward world model based on inverse reinforcement learning (IRL) with a self-constructed VLA approach. Comprised of three steps, the framework first pretrains VLA policies via imitation learning. In the second step, it builds a lightweight reward world model via IRL, enabling efficient closed-loop reward computation. Finally, it designs a specialized reward world model-guided reinforcement learning using Proximal Policy Optimization (PPO) to effectively balance safety, comfort, and traffic efficiency. It achieves state-of-the-art performance on the NAVSIM v2 end-to-end driving benchmark and ranked first in the CVPR2025 Autonomous Driving Grand Challenge.
Takeaways, Limitations
•
Takeaways:
◦
We present a novel VLA framework (IRL-VLA) that enhances the efficiency of closed-loop reinforcement learning by utilizing a lightweight reward world model based on inverse reinforcement learning.
◦
Improving autonomous driving performance by combining imitation learning, inverse reinforcement learning, and PPO-based reinforcement learning to balance safety, comfort, and efficiency.
◦
Excellent performance verified in the NAVSIM v2 benchmark and CVPR2025 Autonomous Driving Grand Challenge.
◦
Contributing to the advancement of VLA research in the field of closed-loop autonomous driving.
•
Limitations:
◦
Further evaluation of the generalization performance of the proposed IRL-VLA framework is needed.
◦
Additional research is needed to verify performance and ensure safety in real-world environments.
◦
Lack of detailed description of the design and learning process of the lightweight reward world model.
◦
Further research is needed on adaptability to various environments and situations.