Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Edge-Based Multimodal Sensor Data Fusion with Vision Language Models (VLMs) for Real-time Autonomous Vehicle Accident Avoidance

Created by
  • Haebom

Author

Fengze Yang, Bo Yu, Yang Zhou, Xuewen Luo, Zhengzhong Tu, Chenxi Liu

Outline

This paper proposes REACT (Real-time Edge-based Autonomous Co-pilot Trajectory Planner), a real-time, lightweight vision-language model-based trajectory planning framework that integrates vehicle-to-everything (V2X) communication to overcome the detection limitations of autonomous driving systems. REACT fine-tunes a lightweight vision-language model (VLM) to integrate infrastructure-provided hazard alerts with in-vehicle sensor data, understands complex traffic dynamics and vehicle intent through visual embedding, interprets precise numerical data from symbolic inputs, and generates safety-centric, optimized trajectories through context-sensitive inference. For real-time deployment, REACT utilizes a residual path fusion (RTF) design and a specialized edge adaptation strategy to reduce model complexity and improve inference efficiency. Evaluation results on the DeepAccident benchmark demonstrate state-of-the-art performance, achieving a 77% reduction in collision rate, 48.2% improvement in Video Panoptic Quality (VPQ), and 0.57 seconds of inference latency.

Takeaways, Limitations

Takeaways:
Demonstrating the effectiveness of real-time collaborative planning using lightweight VLM.
Suggesting the possibility of improving traffic safety and responsiveness through language-induced situational inference.
Overcoming the detection limitations of autonomous driving systems through V2X integration.
Real-time performance improvement through RTF and edge adaptation strategies.
Achieving state-of-the-art performance on the DeepAccident benchmark.
Limitations:
Need to verify generalization performance for specific environments of the DeepAccident benchmark.
Further research is needed on robustness under diverse weather conditions and complex traffic situations.
Further research is needed on practical applicability due to performance constraints of edge devices.
Possible performance degradation due to training data bias in VLM.
👍