This paper presents a novel technique to validate the premise that a foundation model can reveal deep domain understanding through sequential predictions—similar to how Kepler's predictions of planetary motion led to the discovery of Newtonian mechanics. This technique analyzes how the foundation model adapts to synthetic datasets generated from a hypothesized world model, measuring how well the model's inductive biases match the world model. Experiments across a variety of domains reveal that while the foundation model excels on its training task, it fails to adequately develop the inductive biases underlying the underlying world model when adapting to new tasks. Specifically, we find that foundation models trained on orbital trajectories tend to fail to apply Newtonian mechanics to new physics tasks, due to their reliance on task-specific heuristics that fail to generalize.