Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

DRAMA-X: A Fine-grained Intent Prediction and Risk Reasoning Benchmark For Driving

Created by
  • Haebom

Author

Mihir Godbole, Xiangbo Gao, Zhengzhong Tu

Outline

This paper highlights the importance of predicting short-term movements of vulnerable road users (VRUs) for the safety of autonomous driving, particularly in urban environments where ambiguous or risky behaviors are prevalent. While existing vision-language models (VLMs) enable open-vocabulary recognition, their application to fine-grained intent inference remains an unexplored area. To address this gap, this paper presents DRAMA-X, a fine-grained benchmark generated through an automatic annotation pipeline based on the DRAMA dataset. DRAMA-X includes object bounding boxes, nine-directional intent classifications, binary risk scores, expert-generated autonomous action suggestions, and descriptive motion summaries for 5,686 accident risk frames. These annotations enable a structured evaluation of four interrelated tasks (object detection, intent prediction, risk assessment, and action suggestion) that are central to autonomous driving decision-making. As a baseline, this paper proposes SGG-Intent, a lightweight, training-free framework that mirrors the inference pipeline of autonomous vehicles. SGG-Intent sequentially generates a scene graph from visual input using a VLM-based detector, infers intents, assesses risk, and recommends actions using a compositional inference step based on a large-scale language model. We evaluate various state-of-the-art VLMs and compare their performance across four tasks in DRAMA-X. Experimental results demonstrate that scene graph-based inference improves intent prediction and risk assessment, especially when contextual cues are explicitly modeled.

Takeaways, Limitations

Takeaways:
The DRAMA-X benchmark sets a new standard for VRU intent prediction in autonomous driving.
SGG-Intent presents an effective approach for intent inference and risk assessment using VLMs.
We experimentally demonstrate that scene graph-based inference improves the accuracy of VRU's intent prediction and risk assessment.
Limitations:
DRAMA-X The size and diversity of the dataset may be limited.
The performance of SGG-Intent depends on the performance of the VLM and LLM used.
Further research is needed on generalization performance in real-world environments.
Robustness assessments for various environments and situations may be lacking.
👍