This paper introduces the Progressive Weight Loading (PWL) technique, a proposed technique to address the challenges of model loading time and initial inference delays caused by the increasing size and complexity of deep learning models. PWL enables fast initial inference by first deploying a lightweight student model and gradually replacing layers of a pre-trained teacher model. To achieve this, we propose a training method that aligns intermediate feature representations between the student and teacher layers and improves the overall output performance of the student model. Experimental results show that models trained with PWL maintain excellent distillation performance and gradually improve accuracy as teacher layers are loaded, ultimately achieving the same accuracy as the full teacher model without compromising initial inference speed.