This paper presents a benchmark for evaluating the historical contextualization capabilities of text-to-image (TTI) diffusion models. Using the HistVis dataset, we evaluate how TTI models represent specific eras in terms of implicit stylistic associations, historical consistency, and demographic representation. Our results demonstrate that TTI models exhibit systematic inaccuracies in depicting historical topics, overuse certain styles, include anachronistic elements, and fail to reflect realistic demographic patterns.