This paper addresses the problem of inconsistent and sometimes flawed evaluation protocols in the field of differential privacy (DP) image synthesis and proposes DPImageBench, a standardized evaluation benchmark for DP image synthesis. DPImageBench systematically evaluates 11 major methodologies, nine datasets, and seven fidelity and usability metrics. Specifically, we find that the common practice of selecting the subclassifier that achieves the highest accuracy on a sensitive test set violates DP and overestimates the usability score, and we correct this. Furthermore, we demonstrate that pretraining on public image datasets is not always beneficial, and that distributional similarity between pretraining and sensitive images significantly impacts the performance of synthesized images. Finally, we find that adding noise to low-dimensional features (e.g., high-dimensional features of sensitive images) rather than high-dimensional features (e.g., weight gradients) is less sensitive to privacy budgets and yields better performance under low privacy budgets.