Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Is Chain-of-Thought Reasoning of LLMs a Mirage? A Data Distribution Lens

Created by
  • Haebom

Author

Chengshuai Zhao, Zhen Tan, Pingchuan Ma, Dawei Li, Bohan Jiang, Yancheng Wang, Yingzhen Yang, Huan Liu

Outline

This paper analyzes the performance improvement of Large Language Models (LLMs) through Chain-of-Thought (CoT) prompting from a data distribution perspective. We investigate whether CoT inference reflects structural inductive biases learned from training data, enabling conditional generation that approximates the inference paths observed during training. To achieve this, we design DataAlchemy, a controlled environment where we train LLMs from scratch and systematically investigate various distributional conditions. We analyze CoT inference across three dimensions: task, length, and format. Our results reveal that CoT inference is a fragile phenomenon that disappears outside the training distribution, highlighting the difficulty of achieving truly generalizable inference.

Takeaways, Limitations

Takeaways: By revealing the limitations of CoT inference from the perspective of data distribution, we provide a deeper understanding of the inference capabilities of LLM. We suggest that the effectiveness of CoT prompting is fundamentally limited by the degree of distributional mismatch between the training data and test queries. We present a methodology for systematically analyzing the inference mechanism of LLM using a controlled environment such as DataAlchemy.
Limitations: The DataAlchemy environment is based on experimental results under specific conditions, so further research is needed to determine its generalizability to complex real-world environments. The analysis is limited to certain dimensions (task, length, format), and the influence of other important factors may not have been considered. Further research is needed to fully elucidate the inherent limitations of CoT inference.
👍