Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Separating Tongue from Thought: Activation Patching Reveals Language-Agnostic Concept Representations in Transformers

Created by
  • Haebom

Author

Clement Dumas, Chris Wendler, Veniamin Veselovsky, Giovanni Monea, Robert West

Outline

This paper investigates whether there exists a universal concept representation that is independent of language in a multilingual language model (LLM). In a transformer-based LLM, we analyze latent representations (latent variables) during word translation tasks, extracting latent variables from source translation prompts and inserting them into the forward propagation of target translation prompts. We find that the output language is encoded in the latent variables at an earlier layer than the concepts to be translated. Based on this insight, we show that it is possible to change concepts by patching activations while preserving the language and vice versa. We also show that patching concepts with average representations across languages does not affect the model’s translation ability, but rather improves it. Finally, we generalize to multi-token generation, showing that the model can generate natural language explanations for these average representations. Our results provide evidence that there exists a language-independent concept representation in the investigated model.

Takeaways, Limitations

Takeaways: We provide evidence that there exists a universal concept representation that is independent of language in a multilingual LLM. We show that language and concepts can be manipulated independently through activation patches. We suggest that using average representations of concepts across languages can improve translation performance. By showing that the model can generate natural language explanations for these average representations, we can improve the understandability of concept representations.
Limitations: The results are limited to a specific transformer-based LLM and word translation task. Generalizability to other types of LLMs or tasks requires further study. A detailed description of the exact mechanism of concept representation is lacking. A detailed description of the selection and extraction methods of latent representations used in the analysis may be needed.
👍