Autoregressive language models demonstrate impressive performance, but a unified theory explaining their internal mechanisms, how training shapes representations, and how complex behaviors are enabled remains lacking. This paper presents a novel analytical framework that models single-step generation as a composition of information processing stages using Markov categorical language. This compositional perspective provides a unified mathematical language that connects three key aspects of language modeling, typically studied separately: the training objective, the geometry of the learned representation space, and the actual model functionality. First, this framework provides a precise information-theoretic basis for the success of multi-token prediction methods such as speculative decoding, quantifying the information surplus that the model's hidden state contains about tokens beyond the immediate next token. Second, it clarifies how the standard Negative Log-Likelihood (NLL) objective forces the model to learn not only the next word but also the inherent conditional uncertainty of the data, formalizing this using categorical entropy. Our main result demonstrates that minimizing the NLL, assuming a linear softmax head and bounded features, leads to spectral alignment: the learned representation space aligns with the inherent spectrum of the prediction similarity operator. This study provides powerful new insights into how information flows through a model and how training objectives shape its internal geometry.