This paper proposes the "Reflection Hypothesis," which proposes that neural network activity patterns reflect regularities in training data, moving beyond the "black box" perspective of understanding the internal workings of neural networks. We present evidence for this phenomenon in simple recurrent neural networks (RNNs) and large-scale language models (LLMs). We then leverage the cognitive concept of "chunking" to propose three methods (DSC, PA, and UCD) that partition high-dimensional neural population dynamics into interpretable units. These methods complement each other based on the presence or absence of labels and the dimensionality of neural data, extracting units (e.g., words, abstract concepts, and structural schemas) that encode concepts regardless of model structure. We demonstrate that these extracted chunks play a causal role in neural network behavior, suggesting a novel interpretability approach that enhances the understanding of complex learning systems, often considered black boxes.