This paper proposes DP-LET, an efficient framework for accurately predicting spatiotemporal network traffic to dynamically manage computational resources and minimize energy consumption in modern communication systems. DP-LET consists of a data processing module, a local feature enhancement module, and a Transformer-based prediction module. The data processing module is designed for highly efficient denoising and spatial separation of network data, while the local feature enhancement module utilizes multiple Temporal Convolutional Networks (TCNs) to capture fine-grained local features. The prediction module uses a Transformer encoder to model long-term dependencies and assess feature relevance. A case study on real-world cellular traffic prediction demonstrates that DP-LET achieves state-of-the-art performance, reducing the MSE by 31.8% and the MAE by 23.1% compared to baseline models, while maintaining low computational complexity.