This paper designs minimal algorithmic tasks that abstract open, real-world tasks to quantitatively measure the creative limitations of existing language models. These tasks require implicit, open, and probabilistic planning steps, either discovering new connections in an abstract knowledge graph (e.g., puns, analogies, research) or constructing new patterns (e.g., mathematical problems or the design of novel proteins). We empirically and conceptually argue against the myopia of next-token learning and argue that multi-token approaches, such as teacherless training and diffusion models, are superior in generating diverse and original outputs. Furthermore, we find that seed conditioning, which injects noise into the input layer to induce randomness without compromising consistency, is as effective as temperature sampling in the output layer, and under some conditions, even superior. In conclusion, this study provides a principled, minimal test environment for analyzing open-ended creative capabilities and offers new arguments beyond next-token learning and temperature sampling.