This paper studies the phenomenon of model collapse that occurs during iterative training of a large-scale language model (LLM) using synthetic data generated by the LLM. Specifically, we empirically analyze the impact of human data characteristics on this distributional shift. Using various human datasets, we conduct iterative training and, through manipulation of dataset characteristics and regression analysis, identify data characteristics that predict the magnitude of distributional shift. We find that lexical diversity amplifies distributional shift, while semantic diversity and data quality mitigate it. Furthermore, we demonstrate that these effects are modular, meaning that data collected from a specific Internet domain has little influence on content creation in other domains. Finally, experiments on political bias demonstrate that human data characteristics influence whether initial biases are amplified or reduced. Ultimately, we present a novel perspective on how different parts of the Internet can experience different types of distributional shift.