Recent advances in video editing have brought deep learning models focused on spatiotemporal dependencies into the mainstream. However, these models have been limited in their ability to adapt to long-duration and high-resolution videos due to the quadratic computational complexity of existing attention mechanisms. To address this issue, this paper proposes VRWKV-Editor, a novel video editing model that integrates a linear spatiotemporal aggregation module into a video-based diffusion model. VRWKV-Editor leverages the bidirectional weighted key-value recurrence mechanism of the RWKV transformer to capture global dependencies while maintaining temporal consistency, achieving linear complexity without compromising quality.