Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

Diffusion Adversarial Post-Training for One-Step Video Generation

Created by
  • Haebom

Author

Shanchuan Lin, Xin Xia, Yuxi Ren, Ceyuan Yang, Xuefeng Xiao, Lu Jiang

Outline

Diffusion models are widely used for image and video generation, but the iterative generation process is slow and expensive. Existing distillation methods have demonstrated the potential for one-step generation in the image domain, but still suffer from significant quality degradation. In this study, we propose adversarial post-training (APT) on real data based on diffusion pre-training for one-step video generation. To improve training stability and quality, we introduce several improvements to the model architecture and training procedure, as well as an approximate R1 regularization objective. Experimental results demonstrate that Seaweed-APT, an adversarial post-training model, can generate a 2-second, 1280x720, 24fps video in real time in a single forward evaluation step. Furthermore, this model generates 1024px images in a single step, achieving quality comparable to state-of-the-art methods.

Takeaways, Limitations

Takeaways:
We propose Seaweed-APT, a one-step video generation model, to improve the slow generation speed of existing diffusion models.
It can generate real-time 2-second, 1280x720, 24fps video and 1024px images in a single step.
Improved training stability and quality through improvements to model architecture and training procedures and the introduction of R1 regularization.
Limitations:
Further information on specific model architecture and training procedure improvements is lacking.
No details on performance comparisons with other recent models.
Lack of quantitative evaluation metrics (e.g., FID, IS) for the quality of generated videos and images.
👍