Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Entropy-Lens: The Information Signature of Transformer Computations

Created by
  • Haebom

Author

Riccardo Ali, Francesco Caso, Christopher Irwin, Pietro Li o

Outline

To improve the interpretability of Transformer models, this paper proposes the Entropy-Lens framework, which generates an entropy profile by calculating the Shannon entropy of the token distribution at each layer. Instead of analyzing the latent representation, we analyze the evolution of the token distribution directly in the vocabulary space to summarize the model's computational process from an information-theoretic perspective. This entropy profile reveals the model's computational patterns and is used to reveal correlations with prompt type, task format, and output accuracy. Experiments are conducted on various Transformer models and α values to verify the stability and generality of the Shannon entropy. This is achieved without the need for traditional gradient descent, fine-tuning, or access to internal information within the model.

Takeaways, Limitations

Takeaways:
We present a new framework for effectively analyzing the computational process of Transformer models using entropy profiles, an information-theoretic indicator.
You can understand and compare the behavioral characteristics of a model without accessing its internal structure.
The entropy profile provides information about the performance of the model.
It is applicable to various Transformer models and shows consistent results regardless of the model's size or structure.
Limitations:
Entropy profiles may not capture all aspects of a model. Because entropy is a statistical measure of information content, it may not provide detailed information about the computational or decision-making processes.
Because this analysis is based on Shannon entropy, results may vary when using other information measurement methods. Although the paper claims to have obtained similar results in experiments using Renyi entropy, this cannot be generalized to all cases.
No specific figures were provided for prediction accuracy across prompt types or task formats.
👍