This paper considers learning a prompt-answer mapping, where a time-invariant generator iterates through multiple steps to generate a chain of thought, given a base class that generates a sequence of tokens, and the final token is used as the answer. We formulate the learning problem for both cases where the thought process is observed and cases where the thought process is learned only from prompt-answer pairs (when the thought process is latent), and analyze the sample and computational complexity for specific base classes, such as general properties of the base class (e.g., VC dimension) and linear thresholds. We present a simple base class that allows learning a universally representable and computationally tractable chain of thought, and its sample complexity is independent of the length of the chain of thought due to its time-invariance. Attention is introduced naturally in this study.