This paper presents the Safety Cognition Driving Benchmark (SCD-Bench), a novel benchmark for evaluating the safety of vision-language models (VLMs) in autonomous driving systems. To address the scalability issue of data annotation, we introduce Autonomous Driving Annotation (ADA), a semi-automated annotation system reviewed by autonomous driving experts. Through an automated evaluation pipeline, we achieve over 98% agreement with human experts' judgments. Furthermore, we build SCD-Training, the first large-scale dataset for this task (containing 324,350 high-quality samples), contributing to improving the safety cognition capabilities of VLMs. Experimental results show that models trained with SCD-Training outperform SCD-Bench as well as general and domain-specific benchmarks.