Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

Distributional Semantics Tracing: A Framework for Explaining Hallucinations in Large Language Models

Created by
  • Haebom

Author

Gagan Bhatia, Somayajulu G Sripada, Kevin Allan, Jacobo Azcona

Outline

This paper explores the root causes of hallucination in large-scale language models (LLMs). To this end, we 1) propose the Distributional Semantics Tracing (DST) framework, which generates causal maps based on distributed semantics, which treats meaning as a function of context. 2) identify the specific layer (commitment layer) where hallucination becomes inevitable. 3) elucidate predictable failure modes, such as Reasoning Shortcut Hijacks, that arise from the conflict between System 1 (fast, associative reasoning) and System 2 (slow, deliberate reasoning). Measuring the consistency of contextual paths using DST shows a strong negative correlation (-0.863) with the incidence of hallucinations, suggesting a predictable outcome due to inherent semantic weaknesses.

Takeaways, Limitations

Takeaways:
Provides a mechanistic understanding for solving the hallucination problem of LLM.
We present a new method for tracing and analyzing the internal reasoning process of a model through the DST framework.
By identifying specific layers and computational methods where hallucination occurs, we suggest directions for model improvement.
We explain the failure mechanism of LLM through dual-process theory and provide a basis for quantitative analysis.
Limitations:
There is a lack of mention of the complexity and computational cost of the DST framework.
Further research is needed to determine whether the proposed methodology can be generalized to various LLM architectures.
There is a lack of concrete solutions to mitigate Reasoning Shortcut Hijacks.
Generalizability of experimental results and analysis of other types of failure modes are needed.
👍