Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Unsupervised Hallucination Detection by Inspecting Reasoning Processes

Created by
  • Haebom

Author

Ponhvoan Srey, Xiaobao Wu, Anh Tuan Luu

Outline

This paper proposes an unsupervised learning-based hallucination detection method that identifies false information generated by large-scale language models (LLMs) without labeled data. Existing unsupervised learning methods rely on surrogate metrics unrelated to factual accuracy, leading to poor generalization performance across datasets and contexts. IRIS addresses this issue by leveraging internal representations related to factual accuracy. The LLM is guided to verify the truth of a given proposition, and the resulting contextualized embeddings are used as features for learning. The uncertainty of each response is used as a soft pseudo-label for truth. Experimental results demonstrate that IRIS outperforms existing unsupervised learning-based methods. IRIS is a fully unsupervised learning method, computationally inexpensive, and effective even with a small amount of training data, making it suitable for real-time detection.

Takeaways, Limitations

Takeaways:
In fact, it overcomes the limitations of existing unsupervised learning methods by utilizing internal representations related to accuracy.
Effective real-time hallucination detection with low computational cost and small amount of data.
It shows superior performance compared to existing unsupervised learning-based hallucination detection methods.
Limitations:
Due to the high dependence on the internal representation of LLM, there is a possibility of performance degradation depending on the structural change of LLM.
Further validation of generalization performance across different types of hallucinations is needed.
Further analysis is needed on the limitations of using soft pseudo-labels.
👍