Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

Physics-Guided Motion Loss for Video Generation Model

Created by
  • Haebom

Author

Bowen Xue, Giuseppe Claudio Guarnera, Shuang Zhao, Zahra Montazeri

Outline

This paper introduces frequency-domain physical prior knowledge to address the problem of current video diffusion models violating physical laws. Without modifying the model architecture, we improve motion plausibility by decomposing common rigid-body motions (translation, rotation, and scale) into a lightweight spectral loss. This method requires only 2.7% of frequency coefficients while retaining over 97% of the spectral energy. When applied to Open-Sora, MVDIT, and Hunyuan, we demonstrate an average 11% improvement in motion accuracy and action recognition on OpenVID-1M, while maintaining visual quality. In user studies, we achieve a preference rate of 74% to 83%, reduce warping errors by 22 % to 37%, and improve temporal coherence scores.

Takeaways, Limitations

Takeaways:
A simple and effective method to improve physical plausibility without changing model architecture is presented.
Applicable to various video diffusion models such as Open-Sora, MVDIT, and Hunyuan.
Improved motion accuracy and gesture recognition performance while maintaining visual quality.
Demonstrated improvement in user preference and temporal consistency.
Limitations:
The specific Limitations is not specified in the paper.
👍