Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

Learning to Reason for Hallucination Span Detection

Created by
  • Haebom

Author

Hsuan Su, Ting-Yao Hu, Hema Swetha Koppula, Kundan Krishna, Hadi Pouransari, Cheng-Yu Hsieh, Cem Koc, Joseph Yitan Cheng, Oncel Tuzel, Raviteja Vemulapalli

RL4HS, a reinforcement learning-based framework for detecting hallucinations in LLM

Outline

Large-scale language models (LLMs) often generate hallucinations, which are unsubstantiated content that reduces reliability. While most existing research treats hallucination detection as a binary classification problem, real-world applications require identifying the range of hallucinations, necessitating a multi-step decision-making process. To address this issue, we evaluated pre-trained models using Chain-of-Thought (CoT) inference and confirmed that CoT inference can generate at least one correct answer over multiple samplings. Based on this, we propose RL4HS, a reinforcement learning framework that encourages inference through a range-level reward function. RL4HS is based on Group Relative Policy Optimization and introduces Class-Aware Policy Optimization to mitigate the reward imbalance problem. Experimental results on the RAGTruth benchmark (summarization, question answering, and data-to-text transformation) demonstrate that RL4HS outperforms pre-trained inference models and supervised learning-based fine-tuning, demonstrating the importance of reinforcement learning with range-level rewards in detecting the range of hallucinations.

Takeaways, Limitations

Takeaways:
An effective reinforcement learning framework (RL4HS) for LLM hallucination detection is presented.
We confirm the potential of CoT inference and emphasize the need for reinforcement learning based on it.
Improving hallucination range detection performance by utilizing range-level reward functions.
Ensuring learning stability through Group Relative Policy Optimization and Class-Aware Policy Optimization.
Demonstrated superior performance compared to existing models in the RAGTruth benchmark.
Limitations:
Evaluates performance only on a specific benchmark dataset (RAGTruth).
Further research is needed to determine the generalizability of the RL4HS and its applicability to other domains.
Lack of analysis of model complexity and computational cost.
👍