In this paper, we propose a novel architecture, VARMAformer, to improve the efficiency and accuracy of Transformer-based time series forecasting models. While maintaining the efficiency of existing cross-attention-only methods, we combine the strengths of the VARMA model to more effectively capture local temporal dependencies. Key innovations include the VARMA-inspired Feature Extractor (VFE), which explicitly models AR and MA patterns, and the VARMA-Enhanced Attention (VE-atten) mechanism, which enhances contextual awareness. Experiments on various benchmark datasets demonstrate that VFE outperforms existing state-of-the-art models, demonstrating the significant benefits of integrating classical statistical insights into modern deep learning frameworks for time series forecasting.