Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Reasoning Large Language Model Errors Arise from Hallucinating Critical Problem Features

Created by
  • Haebom

Author

Alex Heyman, Joel Zylberberg

Outline

This paper analyzes the causes of inference errors in Reasoning Large Language Models (RLLMs) trained using the Chain-of-Thought (CoT) strategy. We apply the graph coloring problem, a constraint satisfaction logic problem of varying complexity, to o1-mini, o3-mini, DeepSeek-R1, Claude 3.7 Sonnet, Gemini 2.5 Pro Preview, and Grok 3 Mini Beta models. We find that a significant number of errors in all models stem from hallucinating graph edges not explicitly specified in the prompt. This hallucination phenomenon persists regardless of problem complexity and semantic frame, and we confirm that it generalizes to small-scale experiments on stable matching problems. This study identifies a problem in which RLLMs misrepresent the problem features and proposes a design strategy to mitigate it.

Takeaways, Limitations

Takeaways: We have revealed that a significant portion of the inference errors in RLLMs stem from the hallucination of information that conflicts with the input data. This provides important implications for the development and use of RLLMs. It also suggests the need to explore design strategies to address the problem of error representation in problem characteristics.
Limitations: While generalized conclusions have been drawn based on experimental results for graph coloring and stable matching problems, further research is needed to determine the generalizability of these results to other types of problems. Empirical verification of the effectiveness of the proposed design is lacking.
👍