Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

Latent Concept Disentanglement in Transformer-based Language Models

Created by
  • Haebom

Author

Guan Zhe Hong, Bhavya Vasudeva, Vatsal Sharan, Cyrus Rashtchian, Prabhakar Raghavan, Rina Panigrahy

Outline

When large-scale language models (LLMs) use in-context learning (ICL) to solve novel tasks, they must infer latent concepts from demonstration examples. This study explores how a transformer model represents latent structures through mechanistic interpretability. Our results demonstrate that the transformer model successfully identifies latent concepts, performs step-by-step concept construction, and, in tasks parameterized by latent numerical concepts, discovers a low-dimensional subspace within the model's representation space, revealing a geometric structure reflecting the underlying parameterization. Both small and large models demonstrate the ability to isolate and utilize latent concepts learned using the ICL method from a small number of abbreviated demonstrations.

Takeaways, Limitations

Takeaways:
We demonstrate that the transformer model successfully identifies and utilizes latent concepts learned through ICL.
We find a low-dimensional subspace reflecting the basis parameterization in the representation space of the model for latent numerical concepts.
Both small and large models demonstrate that latent concepts can be isolated and exploited.
Limitations:
The specific Limitations is not stated in the paper.
👍