This paper presents the results of a systematic study of the speedup claims of optimization algorithms that can replace AdamW for large-scale language model pretraining. We highlight the problems that previous studies have skewed their comparisons due to unfair hyperparameter tuning and limited evaluation settings, and compare ten optimization algorithms across four different model sizes and data-to-model ratios. Our results demonstrate that rigorous hyperparameter tuning and end-of-training evaluations for various model sizes and data-to-model ratios are essential for fair comparisons. Furthermore, we find that the claimed speedups in previous studies are actually lower and tend to decrease with increasing model size. Specifically, we find that the fastest optimization algorithms, such as Muon and Soap, utilize matrix preprocessors, yet their speedup decreases inversely with model size.