We present a theoretical framework for applying the minimum description length (MDL) principle to machine learning. Specifically, we address the lack of a universal measure of model complexity in neural networks such as the Transformer. This paper introduces the concept of an asymptotically optimal description length objective based on Kolmogorov complexity theory. We demonstrate that minimizing this objective achieves optimal compression across all datasets, excluding additive constants, as model resources increase. We demonstrate the computational universality of the Transformer, revealing the existence of an asymptotically optimal objective for the Transformer. Furthermore, we construct and analyze a variational objective based on an adaptive Gaussian mixture prior, demonstrating that this objective is practical and differentiable. While we experimentally analyze variational objectives that select low-complexity solutions with high generalization performance in algorithmic tasks, standard optimizers fail to find such solutions under random initialization, highlighting a key optimization challenge. More broadly, by providing a theoretical framework for identifying a description length objective with strong asymptotic guarantees, we suggest a potential path forward for neural network training that achieves better compression and generalization.