Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

DriveMRP: Enhancing Vision-Language Models with Synthetic Motion Data for Motion Risk Prediction

Created by
  • Haebom

Author

Zhiyi Hou, Enhui Ma, Fang Li, Zhiyi Lai, Kalok Ho, Zhanqian Wu, Lijun Zhou, Long Chen, Chitian Sun, Haiyang Sun, Bing Wang, Guang Chen, Hangjun Ye, Kaicheng Yu

Outline

In this paper, we propose a method to improve the performance of Vision-Language Model (VLM) motion risk prediction by synthesizing high-risk driving data to solve the problem of safety prediction in long-tail scenarios in autonomous driving. We model the risks in terms of three aspects: self-vehicle, other vehicles, and environment using Bird's-Eye View (BEV)-based motion simulation, and generate a high-risk driving dataset, DriveMRP-10K, which is suitable for VLM training. In addition, we propose a risk estimation framework, DriveMRP-Agent, which operates independently of VLM, and integrates global information, self-vehicle viewpoint, and a novel information injection strategy for trajectory prediction to enable VLM to effectively infer spatial relationships. Experimental results show that DriveMRP-Agent fine-tuned with DriveMRP-10K significantly improves the motion hazard prediction performance of multiple VLM-based models (crash recognition accuracy increases from 27.13% to 88.03%) and generalizes well on real high-risk driving datasets (accuracy increases from 29.42% to 68.50%).

Takeaways, Limitations

Takeaways:
We demonstrate that synthesizing high-risk driving data can significantly improve the safety of VLM-based autonomous driving systems.
The proposed DriveMRP-Agent framework has a VLM-agnostic structure applicable to various VLMs and has excellent generalization performance in real environments.
The method of generating high-risk situation data through BEV-based motion simulation can make an important contribution to future research.
Limitations:
The size and diversity of the DriveMRP-10K dataset may not fully encompass all high-risk situations in the real world.
There is a lack of detailed information on the size and composition of the actual high-risk driving dataset.
Lack of analysis of the computational cost and real-time performance of the proposed method.
Lack of detailed information on the in-house real-world high-risk motion dataset makes it difficult to ensure reproducibility.
👍