This paper presents a theoretical analysis of the sample complexity of diffusion models. We aim to overcome the performance degradation of existing analyses due to input data dimensionality and their dependence on practically impossible assumptions (e.g., an exact empirical risk minimizer approach). By structurally decomposing the score estimation error into statistical, approximate, and optimal errors, we eliminate the exponential dependence on neural network parameters seen in existing analyses. In particular, this is the first result to achieve a sample complexity bound of $\widetilde{\mathcal{O}}(\epsilon^{-6})$ without assuming an empirical risk minimizer approach for the score function estimation loss.