[공지사항]을 빙자한 안부와 근황 
Show more

Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity

Created by
  • Haebom

Author

Parshin Shojaee, Iman Mirzadeh, Keivan Alizadeh, Maxwell Horton, Samy Bengio, Mehrdad Farajtabar

Outline

This paper systematically investigates the performance, scalability, and limitations of large-scale reasoning models (LRMs). Unlike previous studies that mainly focus on the accuracy of the final answer, this study analyzes not only the final answer but also the internal reasoning process by using a controllable puzzle environment where the complexity can be precisely adjusted. The experimental results show that LRMs completely collapse in accuracy beyond a certain complexity, and the reasoning effort increases to a certain extent as the problem complexity increases, but then decreases despite the remaining token budget, showing a paradoxical scalability limit. In addition, we present three performance areas according to low, medium, and high complexity tasks by comparing them with standard LLMs, and reveal the limitations of LRMs in clear computation and consistent reasoning. Through the analysis of the reasoning process, we study the solution search pattern and the computational behavior of the model, raising questions about the strengths and limitations of LRMs and their reasoning ability.

Takeaways, Limitations

Takeaways:
A systematic performance evaluation method for LRMs using a controllable puzzle environment is presented.
Accuracy breakdown and paradoxical scaling limits of LRMs discovered
Identify three performance areas for low, medium, and high complexity tasks of LRMs
Identifying the limits of explicit computation and consistent inference of LRMs
Provides insight into the strengths and limitations of LRMs through analysis of their reasoning processes.
Limitations:
Further research is needed on the generalizability of controllable puzzle environments.
More in-depth research is needed on the analysis of the reasoning process of LRMs.
Lack of specific measures to overcome the limitations of LRMs
👍