This paper presents research utilizing quantum computing to solve the large-scale capacity-constrained pick-up and delivery problem (CPDPTW). Specifically, we propose a novel method that integrates parameterized quantum circuits (PQCs) into a reinforcement learning (RL) framework to minimize travel times in realistic last-mile delivery services. We design a problem-specific encoded quantum circuit that incorporates entanglement and variational layers, and demonstrate the superiority of the proposed method in terms of scale and training complexity through comparative experiments with PPO and QSVT. This presents an efficient solution to a large-scale problem that is difficult to handle with existing classical approaches.