To address the high computational and memory overhead of large-scale language models (LLMs), this paper proposes a novel model pruning method specialized for prefill-decode (PD) partitioned inference. To overcome the limitations of existing methods, which do not consider the characteristics of PD partitioning, we construct pruning and distillation sets that independently perform iterative block removal for the prefill and decode stages. Furthermore, we introduce a token-aware cache pruning mechanism that retains all KV cache entries in the prefill stage while selectively reusing KV cache entries only for the first and last token sequences of selected layers in the decode stage, thereby minimizing communication costs. Experimental results demonstrate that the proposed method achieves superior performance and faster inference speeds in both PD partitioned and non-partitioned settings, while reducing data transmission bandwidth consumption by 4.95x.