Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Crosscoding Through Time: Tracking Emergence & Consolidation Of Linguistic Representations Throughout LLM Pretraining

Created by
  • Haebom

Author

Deniz Bayazit, Aaron Mueller, Antoine Bosselut

Outline

This paper presents a method for discovering and aligning features across model checkpoints using sparse crosscoders to understand when and how specific language abilities emerge during pretraining of large-scale language models (LLMs). We aim to overcome the limitations of existing benchmarking approaches and understand model training at a conceptual level. Specifically, we train crosscoders across three pairs of open-source checkpoints with significant performance and representational variation and introduce a novel metric, the relative indirect effect (RelIE), to track the training phases at which individual features become causally important for task performance. We demonstrate that this allows for the detection of feature emergence, retention, and disruption during pretraining. This architecture-independent and highly scalable method offers a promising path toward interpretable and fine-grained analysis of representation learning across pretraining.

Takeaways, Limitations

Takeaways:
To enhance understanding of the timing and course of the emergence of specific language skills in pre-LLM training courses.
A novel analysis method utilizing the sparse cross-coder and the RelIE metric is presented.
An architecture-independent and scalable analysis method that can be applied to various models.
Improving the interpretability of the model training process.
Limitations:
Further validation of the accuracy and reliability of the RelIE index is needed.
Methodological limitations of relying on open source checkpoints.
The possibility of subjectivity in the interpretation of the causal significance of features.
Computational cost issues for large-scale models.
👍