Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Evaluation of Safety Cognition Capability in Vision-Language Models for Autonomous Driving

Created by
  • Haebom

Author

Enming Zhang, Peizhe Gong, Xingyuan Dai, Min Huang, Yisheng Lv, Qinghai Miao

Outline

This paper presents the Safety Cognition Driving Benchmark (SCD-Bench), a novel benchmark for evaluating the safety of vision-language models (VLMs) in autonomous driving systems. To address the scalability issue of data annotation, we introduce Autonomous Driving Annotation (ADA), a semi-automated annotation system reviewed by autonomous driving experts. Through an automated evaluation pipeline, we achieve over 98% agreement with human experts' judgments. Furthermore, we build SCD-Training, the first large-scale dataset for this task (containing 324,350 high-quality samples), contributing to improving the safety cognition capabilities of VLMs. Experimental results show that models trained with SCD-Training outperform SCD-Bench as well as general and domain-specific benchmarks.

Takeaways, Limitations

Takeaways:
We present a new benchmark (SCD-Bench) and a large-scale training dataset (SCD-Training) for evaluating the safety perception capabilities of VLMs in autonomous driving environments.
Improving the efficiency and scalability of data annotation with a semi-automatic annotation system (ADA).
Consistent evaluations are possible through automated evaluation pipelines.
Models trained using SCD-Training demonstrate improved performance across various benchmarks, suggesting their potential to contribute to improving the safety of autonomous driving systems.
Limitations:
Further validation is needed regarding the accuracy limits of the ADA system and the subjectivity of expert review.
Further research is needed to determine the generalizability of the SCD-Bench and SCD-Training datasets. They may be biased in specific environments or situations.
Further research is needed to verify safety in real-world autonomous driving environments. This requires verification of how well benchmark results match real-world situations.
👍