This paper highlights that as the training cost of large-scale language models (LLMs) exponentially increases, new techniques are being developed to improve training efficiency. However, predicting optimal model size and allocating resources remains a challenging task. Most existing scaling laws are specialized for dense or sparse architectures. Therefore, in this paper, we propose a generalized scaling law applicable to both dense and sparse LLMs and demonstrate its effectiveness through comparative evaluation with existing scaling laws.