Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Radial Attention: $O(n\log n)$ Sparse Attention with Energy Decay for Long Video Generation

Created by
  • Haebom

Author

Xingyang Li, Muyang Li, Tianle Cai, Haocheng Xi, Shuo Yang, Yujun Lin, Lvmin Zhang, Songlin Yang, Jinbo Hu, Kelly Peng, Maneesh Agrawala, Ion Stoica, Kurt Keutzer, Song Han

Outline

In this paper, we propose a novel scalable sparse attention mechanism, Radial Attention, which exploits the phenomenon of spatiotemporal energy decay to solve the problem of increasing computational cost in video diffusion models. Radial Attention reduces the computational complexity to O(n log n) by exploiting the phenomenon that attention scores decrease as the spatial and temporal distance between tokens increases. This is much more efficient than the conventional dense attention with O(n²) complexity, and has better expressive power than linear attention. Radial Attention is designed to focus attention only on spatially close tokens, and the attention window size decreases as the temporal distance increases. In addition, we efficiently extend the generation length of a pre-trained video diffusion model through LoRA-based fine-tuning. Experimental results show that Radial Attention achieves up to 1.9x speedup while maintaining video quality on various models such as Wan2.1-14B, HunyuanVideo, and Mochi 1. It enables up to 4x longer video generation with minimal fine-tuning, reduces training costs by up to 4.4x compared to direct fine-tuning, and speeds up inference by up to 3.7x compared to dense attention inference.

Takeaways, Limitations

Takeaways:
We present Radial Attention, a novel sparse attention mechanism that effectively reduces the computational cost of video diffusion models.
It is much more efficient than conventional dense attention and greatly improves video generation speed and training efficiency.
Efficiently extend the generation length of pre-trained models.
Validation of performance improvements in various video diffusion models.
Limitations:
The performance improvements of Radial Attention may be limited to certain models and datasets.
Further research is needed on hyperparameter optimization, such as adjusting the attention window size.
Lack of performance evaluation for extremely long video generation.
Further comparative analysis with other sparse attention mechanisms is needed.
👍