This paper addresses the tendency of modern deep neural networks (DNNs) to induce heavy-tailed (HT) empirical spectral density (ESD) in layer weights. While previous studies have shown that the HT phenomenon is correlated with good generalization in large-scale NNs, a theoretical explanation for its occurrence remains lacking. In particular, understanding the conditions that trigger this phenomenon could help elucidate the interplay between generalization and weight spectral density. This study aims to fill this gap by presenting a simple and rich setting for modeling the emergence of HT ESD. Specifically, we present a setting based on the theory that "creates" heavy tails in ESD in two-layer NNs and provide a systematic analysis of the emergence of HT ESD without any gradient noise. This is the first study to analyze noise-free settings and incorporates optimizer (GD/Adam)-dependent (large) learning rates into the analysis of HT ESD. Our results highlight the role of learning rates in the early stages of training for the Bulk+Spike and HT forms of ESD, which can promote generalization in two-layer NNs. These observations, although in a much simpler setup, provide insight into the behavior of large-scale NNs.