Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

PRIX: Learning to Plan from Raw Pixels for End-to-End Autonomous Driving

Created by
  • Haebom

Author

Maciej K. Wozniak, Lianhang Liu, Yixi Cai, Patric Jensfelt

Outline

PRIX (Plan from Raw Pixels) is an efficient end-to-end architecture that predicts safe paths for autonomous driving using only camera data. It eliminates the dependency on existing large-scale models, expensive LiDAR sensors, and computationally intensive BEV (Bird's Eye View) feature representations, and utilizes a generative planning head and visual feature extractor that predict paths directly from raw pixel inputs. The core component, Context-aware Recalibration Transformer (CaRT), effectively enhances multi-level visual features to enable more robust planning. It achieves state-of-the-art performance on NavSim and nuScenes benchmarks, and is comparable to large-scale multimodal diffusion planning models while being much more efficient in terms of inference speed and model size. Therefore, it is evaluated as a practical solution suitable for real-world deployment. The source code will be released.

Takeaways, Limitations

Takeaways:
Demonstrates end-to-end autonomous driving using only cameras, without relying on LiDAR.
Achieve excellent performance without BEV representation and reduce computational costs.
A lightweight model suitable for deployment in real environments.
Achieving cutting-edge performance.
Open source release for improved accessibility.
Limitations:
In addition to NavSim and nuScenes benchmark performance, performance verification on other datasets or real environments is needed.
Further research is needed on the generalization performance of the CaRT module and its adaptability to various environments.
Robustness verification against the complexity and unpredictability of real road environments is required.
👍