This paper presents a novel technique to test the premise that foundation models can discover deep domain understanding through sequential predictions—much as Kepler’s predictions of planetary motion led to the discovery of Newtonian dynamics. By analyzing how foundation models adapt to synthetic datasets generated from assumed world models, we develop an “inductive bias probe” that measures how well the model’s inductive biases match the world model. Experiments across a variety of domains show that foundation models perform well on training tasks but fail to develop inductive biases for the underlying world model when adapted to new tasks. In particular, we find that foundation models trained on orbital trajectories consistently fail to apply Newtonian dynamics to new physics tasks, due to the development of task-specific heuristics that fail to generalize.