This paper focuses on modularizing deep neural networks (DNNs) to address the problem of increased inference overhead associated with full model reuse, aiming to reduce costs through DNN model reuse. Specifically, we focus on modularizing while-training (MwT) methods, which outperform existing modularization-after-training methods. We propose NeMo, a scalable MwT approach applicable to large-scale models and diverse DNN architectures (especially Transformer-based models). NeMo performs modularization at the neuron level and designs a modular training method based on contrastive learning and a complex loss function for large-scale model applications. Experimental results on two Transformer-based models and four CNN models demonstrate that NeMo improves module classification accuracy by an average of 1.72% and reduces module size by 58.10% compared to existing state-of-the-art MwT methods. Case studies using open-source projects demonstrate the practical benefits of NeMo.