[공지사항]을 빙자한 안부와 근황 
Show more

Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Analyze the Neurons, not the Embeddings: Understanding When and Where LLM Representations Align with Humans

Created by
  • Haebom

Author

Masha Fedzechkina, Eleonora Gualdoni, Sinead Williamson, Katherine Metcalf, Skyler Seto, Barry-John Theobald

Outline

This paper presents a novel approach to study how well representations of large-scale language models (LLMs) match human representations. We use activation steering to identify neurons for specific concepts (e.g., “cat”) and analyze their activation patterns. We show that the LLM representations captured in this way are very similar to human representations inferred from behavioral data, and are consistent with human-to-human agreement. The agreement is much higher than word embeddings, which have been used in previous studies, and demonstrate that LLMs organize concepts in a human-like manner.

Takeaways, Limitations

Takeaways:
We present a novel method to quantitatively measure the level of alignment between LLM representations and human representations.
We verify alignment between LLM and human representations at a higher level than word embeddings.
LLM demonstrates that the brain organizes concepts in a human-like manner.
A more detailed analysis of the conceptual representation of LLM is possible.
Limitations:
Since it relies on activation steering techniques, verification of the accuracy of neuron identification for specific concepts is required.
The analysis may be limited to specific LLMs and specific concepts. Further research is needed to determine generalizability.
__T5070_____ of human representation inference from behavioral data should be considered.
👍