Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

BLADE: Block-Sparse Attention Meets Step Distillation for Efficient Video Generation

Created by
  • Haebom

Author

Youping Gu, Xiaolong Li, Yuhao Hu, Minqi Chen, Bohan Zhuang

BLADE: Data-Free Joint Acceleration of Diffusion Transformers for Video Generation

Outline

In this paper, we propose BLADE, a data-free joint training framework that combines Adaptive Block-Sparse Attention (ASA) and sparsity-aware step distillation to address the inference bottleneck of Diffusion Transformer for high-quality video generation. BLADE features an ASA mechanism that dynamically generates content-aware sparsity masks, and a sparsity-aware step distillation scheme that directly integrates sparsity into the distillation process based on Trajectory Distribution Matching (TDM). In experiments on text-to-video models such as CogVideoX-5B and Wan2.1-1.3B, BLADE demonstrates significant efficiency improvements, achieving end-to-end inference acceleration of 14.10x on Wan2.1-1.3B and 8.89x on CogVideoX-5B. This acceleration is supported by quality improvements in the VBench-2.0 benchmark (CogVideoX-5B from 0.534 to 0.569, Wan2.1-1.3B from 0.563 to 0.570) and human evaluation results.

Takeaways, Limitations

Takeaways:
Dramatically improves the inference speed of Diffusion Transformer-based video generation models.
Presenting an innovative framework that achieves efficient acceleration without using data.
Improve video creation quality while increasing speed.
Proven applicability to models of various sizes.
Limitations:
The specific Limitations is not specified in the paper.
(Inferable) Possible performance bias for specific models and datasets.
(Inferable) Difficulty in implementation and tuning due to the complexity of ASA and TDM.
👍