Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

Not a nuisance but a useful heuristic: Outlier dimensions favor frequent tokens in language models

Created by
  • Haebom

Author

Iuri Macocco, Nora Graichen, Gemma Boleda, Marco Baroni

Outline

This paper studies the "outlier dimension" in the final layer, which exhibits extreme activations for most inputs. We demonstrate that this outlier dimension occurs in various state-of-the-art language models and that its function is related to a heuristic that consistently predicts frequent words. Furthermore, we demonstrate that this heuristic can be counteracted by assigning balanced weights to the remaining dimensions when the model is inappropriate for the context. We investigate when model parameters increase the outlier dimension and when it occurs during training. In conclusion, we demonstrate that the outlier dimension is a specialized mechanism discovered by many models to implement useful token prediction heuristics.

Takeaways, Limitations

Takeaways:
Identifying the Existence and Role of Outlier Dimensions in Language Models
Presentation of a mechanism for implementing a frequent word prediction heuristic.
Provide a heuristic blocking method when the situation is not suitable.
Analysis of the relationship between model parameters and outlier dimensions
Limitations:
Possible lack of in-depth analysis of specific model architectures or training settings
Further research is needed on the generalized influence of outlier dimensions.
Further verification is needed to determine the practical improvement effect of the proposed heuristic blocking method.
👍