Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

Understanding In-context Learning of Addition via Activation Subspaces

Created by
  • Haebom

Author

Xinyan Hu, Kayo Yin, Michael I. Jordan, Jacob Steinhardt, Lijie Chen

Outline

This paper explores how prediction rules are implemented in the forward pass of a language model performing few-shot learning. We study a few-shot learning task with a prediction rule that adds an integer $k$ to the input, and propose a novel optimization method that confines the model's few-shot capabilities to a small number of attention heads. We conduct a detailed analysis of individual heads through dimensionality reduction and decomposition, and, using the Llama-3-8B-instruct model as an example, we analyze the model's mechanisms by reducing it to three attention heads and a six-dimensional subspace. Furthermore, we derive mathematical identities connecting the "aggregate" and "extract" subspaces for the attention heads, enabling us to trace the information flow from individual examples to the final aggregated concepts. This allows us to identify a self-correcting mechanism whereby mistakes learned from early demonstrations are suppressed by later demonstrations.

Takeaways, Limitations

Takeaways:
We demonstrate that the small-shot learning ability within a language model can be localized to a specific attention head.
Low-dimensional subspace analysis of the attention head allows us to understand the fine-grained computational structure of the model.
By revealing the relationship between the "aggregate" and "extract" subspaces, we trace the information flow and elucidate the self-correction mechanism.
We present a novel analytical framework for understanding the complex workings of language models.
Limitations:
May have limited generalization as it focuses on a specific type of small-shot learning task (integer addition).
Further research is needed to determine whether the analysis results for the Llama-3-8B-instruct model apply equally to other models.
Limited to the analysis of a small number of attention heads, the contributions of other model components may be overlooked.
👍