GRAF is a structured multimodal benchmark for evaluating models on instruction-following, visual reasoning, and visual-text alignment tasks. It features programmatically generated charts and synthetically rendered tables, generated using a Python visualization library, allowing control over data semantics, structure, and clarity. Each GRAFT instance pairs a chart or table image with a systematically generated multi-step analysis question based solely on visual content. Answers are provided in a structured format, such as JSON or YAML, enabling consistent evaluation of inference and output formats. The benchmark enables comprehensive evaluation by introducing a classification of inference types, including comparison, trend identification, ranking, aggregation, proportion estimation, and anomaly detection. Reference answers adhere to rigorous factual and formal guidelines for accurate and aspect-based evaluation. GRAFT sets a new standard for evaluation in this field by providing a unified and scalable framework for fine-grained benchmarking of multimodal models on visually based structured reasoning tasks.