Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

Mixture of Contexts for Long Video Generation

Created by
  • Haebom

Author

Shengqu Cai, Ceyuan Yang, Lvmin Zhang, Yuwei Guo, Junfei Xiao, Ziyan Yang, Yinghao Xu, Zhenheng Yang, Alan Yuille, Leonidas Guibas, Maneesh Agrawala, Lu Jiang, Gordon Wetzstein

Outline

Models for long-term video generation face a fundamental long-term context memory problem: maintaining and retrieving long-term contexts. The application of diffusion transformers to long-term context video generation is limited by the quadratic cost of self-attention. To address this issue, this paper treats it as an internal information retrieval task and proposes a simple, learnable, sparse attention routing module called Mixture of Contexts (MoC), an effective long-term memory search engine. In MoC, each query dynamically selects a few information-rich chunks and essential anchors (captions, local windows) to focus attention on. This utilizes causal routing to avoid loop closures. By increasing data size and progressively sparsifying routing, the model allocates computation to key records, preserving identities, actions, and scenes across several minutes of content. This search-based approach achieves efficiency (nearly linear scaling), enables practical learning and synthesis, and exhibits memory and consistency at the minute-scale.

Takeaways, Limitations

Takeaways:
Solving the long-term video generation problem by redefining it as internal information retrieval.
Efficiently solve long-term memory problems by implementing sparse attention through the MoC module.
Maintain consistency of identity, action, and scenes in minute-by-minute video creation.
Enables practical learning and synthesis with efficient computation.
Limitations:
There is no specific mention of Limitations in the paper.
👍