This paper highlights the limitations of existing pre-training strategies that operate on fixed computational budgets as model and dataset sizes rapidly increase, and explores more scalable alternatives. Specifically, we revisit the "Schedule-Free (SF)" method and analyze the performance of SF-AdamW, which effectively explores the "river" structure of the loss function without a decay step or additional memory. Through theoretical and empirical analysis of the dynamics of SF, we demonstrate that SF implicitly performs weighted averaging without memory overhead. Based on this analysis, we propose an improved variant of SF that enhances momentum robustness and performs better on large batch sizes.