Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

Specialization after Generalization: Towards Understanding Test-Time Training in Foundation Models

Created by
  • Haebom

Author

Jonas H ubotter, Patrik Wolf, Alexander Shevchenko, Dennis J uni, Andreas Krause, Gil Kur

Outline

Recent studies have found that test-time training (TTT) can yield significant performance improvements, but understanding why and when TTT is effective remains limited. This paper argues that TTT is useful for globally underparameterized models and provides a mechanism for post-generalization specialization, i.e., focusing capacity on concepts relevant to the test task. Based on the linear representation hypothesis, we propose a model in which TTT achieves significantly smaller within-distribution test errors than global training. We train a sparse autoencoder on ImageNet to verify the core assumption that semantically related data points are explained by a small number of shared concepts. We conduct scaling studies for image and language tasks to identify areas where specialization is most effective.

Takeaways, Limitations

Takeaways:
We present that TTT is an effective method to improve the performance of under-parameterized models.
Explain how TTT works through the linear representation hypothesis and verify it with real data.
Scaling studies identify areas where specialization is most effective.
Limitations:
It provides an in-depth understanding of the effectiveness of TTT, but does not address the effectiveness of other forms of TTT.
Experiments with sparse autoencoders may be limited to specific model architectures.
Further analysis is needed on specific areas where specialization is effective.
👍