This paper addresses the problem of overfitting in machine learning models, particularly the phenomenon of memorization, which raises concerns about privacy and generalization. The traditional memorization metric, counterfactual self-influence, quantifies how much a model’s predictions change depending on whether a particular sample is included in the training dataset. However, recent studies have shown that in addition to self-influence, other training samples, especially (nearly) duplicated samples, have a significant impact on memorization. In this paper, we study memorization by treating counterfactual self-influence as a distribution that considers the influence of all training samples on the memorization of a particular sample. Using a small-scale language model, we compute the overall influence distribution among all training samples and analyze its characteristics. We find that considering only self-influence can severely underestimate the practical risk associated with memorization. The presence of (nearly) duplicated samples significantly reduces self-influence, but such samples are found to be (nearly) extractable. We observe a similar pattern in CIFAR-10 image classification, where the presence of (nearly) duplicated samples can be identified by the influence distribution alone. In conclusion, we emphasize that memorization arises from complex interactions among training data, and that self-influence alone cannot capture memorization as well as the overall influence distribution.