Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Hallucinated Span Detection with Multi-View Attention Features

Created by
  • Haebom

Author

Yuya Ogasa, Yuki Arase

Outline

This paper addresses the problem of detecting hallucinated spans in large-scale language model output. While relatively less attention has been paid to hallucination detection at the global output level, this problem is crucial in practice. Previous research has shown that attention exhibits unusual patterns when hallucinations occur. Based on this, we extract features from the attention matrix, which provide complementary insights into (a) whether specific tokens are influential or ignored, (b) whether attention is biased toward a specific subset, and (c) whether tokens are generated with a narrow or broad context. These features are then fed into a Transformer-based classifier to identify hallucinated spans through sequential labeling. Experimental results demonstrate that the proposed method outperforms robust baseline models in detecting hallucinated spans in long input contexts, such as text generation and summarization.

Takeaways, Limitations

Takeaways:
A novel method for effectively detecting hallucinatory segments in large-scale language models using attention matrix analysis is presented.
Achieves superior performance over existing methods for tasks with long input contexts (e.g., text generation from data, summarization).
Providing insights into the hallucination generation mechanism through attention-based feature extraction.
Limitations:
Further verification of the generalization performance of the proposed method is needed.
Experimental results on various types of large-scale language models and tasks are needed.
Further research is needed to improve the accuracy of hallucination segment detection.
👍