This paper introduces DriveAction, a novel benchmark for evaluating Vision-Language-Action (VLA) models in autonomous driving. DriveAction is designed to overcome the limitations of existing benchmarks, which include a lack of diverse scenarios, reliable action-level annotations, and evaluation protocols that are tailored to human preferences. DriveAction is based on real-world autonomous driving data, containing 16,185 QA pairs and 2,610 driving scenarios. It uses high-level discrete action labels directly collected from drivers' actual driving behavior. Furthermore, it implements an action-based, tree-structured evaluation framework to ensure a clear link between vision, language, and actions. Experimental results demonstrate that state-of-the-art VLMs require both visual and language guidance for accurate action prediction, with accuracy degrading by 3.3% without visual input, 4.1% without language input, and 8.0% without both.