Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

From Imitation to Optimization: A Comparative Study of Offline Learning for Autonomous Driving

Created by
  • Haebom

Author

Antonio Guillen-Perez

Outline

This paper addresses the problem of learning robust autonomous driving policies from large-scale real-world datasets. Considering the challenges of online data collection, we propose a series of models based on the behavior cloning (BC) technique and compare and study several BC baseline models, including a Transformer-based entity-centric state representation model. However, BC models exhibit vulnerabilities in long-term simulations. To address this, we apply Conservative Q-Learning (CQL), a state-of-the-art offline reinforcement learning algorithm, to the same data and architecture to learn more robust policies. Using a carefully designed reward function, the CQL agent learns a conservative value function that recovers from minor errors and avoids out-of-distribution states. In a large-scale evaluation on 1,000 unknown scenarios from the Waymo Open Motion Dataset, the CQL agent achieved a 3.2x higher success rate and a 7.4x lower crash rate than the best-performing BC baseline model. This demonstrates the importance of offline reinforcement learning approaches for learning robust, long-term autonomous driving policies from static expert data.

Takeaways, Limitations

Takeaways:
We demonstrate that offline reinforcement learning (CQL) can be used to learn autonomous driving policies that are significantly more robust and long-term than behavior replication (BC).
Transformer-based models using entity-centric state representations perform well in BC techniques, but achieve even better performance when combined with offline reinforcement learning.
A carefully designed reward function plays a crucial role in the robustness of the CQL agent.
The effectiveness of the proposed method was verified through large-scale experiments using the Waymo Open Motion Dataset.
Limitations:
Designing a reward function remains a challenging task, and its design can significantly impact performance.
The CQL algorithm can be computationally expensive.
Performance in real-world environments requires further verification.
Generalization performance may vary depending on the characteristics of the dataset used.
👍