Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

A Theory of Learning with Autoregressive Chain of Thought

Created by
  • Haebom

Author

Nirmit Joshi, Gal Vardi, Adam Block, Surbhi Goel, Zhiyuan Li, Theodor Misiakiewicz, Nathan Srebro

Outline

This paper considers learning a prompt-answer mapping, where a time-invariant generator iterates through multiple steps to generate a chain of thought, given a base class that generates a sequence of tokens, and the final token is used as the answer. We formulate the learning problem for both cases where the thought process is observed and cases where the thought process is learned only from prompt-answer pairs (when the thought process is latent), and analyze the sample and computational complexity for specific base classes, such as general properties of the base class (e.g., VC dimension) and linear thresholds. We present a simple base class that allows learning a universally representable and computationally tractable chain of thought, and its sample complexity is independent of the length of the chain of thought due to its time-invariance. Attention is introduced naturally in this study.

Takeaways, Limitations

Takeaways:
By showing that the sample complexity of learning thought processes using time-invariant generators does not depend on the length of the thought process, we suggest the possibility of efficiently learning long thought processes.
We propose a new base class for learning universally expressible and computationally efficient thought processes.
We provide a novel framework in which attention mechanisms are naturally induced.
Limitations:
There is a lack of experimental validation of the actual performance and generalization ability of the proposed base class.
The limitations of the assumption of time invariance when applied to real-world problems are unclear.
Further research is needed to effectively model complex thought processes.
👍