Synthetic Bootstrapped Pretraining (SBP) is a novel method for pretraining language models. Unlike traditional language model pretraining methods that focus on learning causal relationships between tokens within a single document, it models relationships between documents, generates a new, large-scale synthetic dataset, and utilizes this dataset for pretraining. SBP pretrains a 3 billion-parameter model using 1 trillion tokens of data, outperforming a simple iterative baseline model and achieving significant performance improvements over ideal scenarios with 20x more unique data. Qualitative analysis reveals that the synthesized documents are not simply paraphrased, but rather extract core concepts from the original documents to generate new narratives. From a Bayesian perspective, this can be interpreted as a process of abstracting shared latent concepts between related documents.