Recent studies have found that test-time training (TTT) can yield significant performance improvements, but understanding why and when TTT is effective remains limited. This paper argues that TTT is useful for globally underparameterized models and provides a mechanism for post-generalization specialization, i.e., focusing capacity on concepts relevant to the test task. Based on the linear representation hypothesis, we propose a model in which TTT achieves significantly smaller within-distribution test errors than global training. We train a sparse autoencoder on ImageNet to verify the core assumption that semantically related data points are explained by a small number of shared concepts. We conduct scaling studies for image and language tasks to identify areas where specialization is most effective.