This paper introduces frequency-domain physical prior knowledge to address the problem of current video diffusion models violating physical laws. Without modifying the model architecture, we improve motion plausibility by decomposing common rigid-body motions (translation, rotation, and scale) into a lightweight spectral loss. This method requires only 2.7% of frequency coefficients while retaining over 97% of the spectral energy. When applied to Open-Sora, MVDIT, and Hunyuan, we demonstrate an average 11% improvement in motion accuracy and action recognition on OpenVID-1M, while maintaining visual quality. In user studies, we achieve a preference rate of 74% to 83%, reduce warping errors by 22 % to 37%, and improve temporal coherence scores.