This paper argues that the success of diffusion models is largely due to input conditioning. Accordingly, we investigate the representations used to condition diffusion models, with the view that an ideal representation should improve sample fidelity, be easy to generate, and be configurable to allow for the generation of out-of-training samples. We introduce discrete latent codes (DLCs), derived from simple compound embeddings trained with self-supervised learning objectives. Unlike standard continuous image embeddings, DLCs are discrete token sequences. They are easy to generate, and their configurability allows for sampling new images beyond the training distribution. DLC-trained diffusion models achieve improved generation fidelity, establishing a new state-of-the-art in unconditional image generation on ImageNet. We also show that constructing DLCs enables image generators to generate out-of-distribution samples that consistently combine the meaning of images in a variety of ways. Finally, we demonstrate how DLCs enable text-to-image generation by leveraging large pre-trained language models. We efficiently fine-tune text diffusion language models to generate DLCs that generate new samples outside the training distribution of the image generator.