Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

MAGREF: Masked Guidance for Any-Reference Video Generation with Subject Disentanglement

Created by
  • Haebom

Author

Yufan Deng, Yuanyang Yin, Xun Guo, Yizhi Wang, Jacob Zhiyuan Fang, Shenghai Yuan, Yiding Yang, Angtian Wang, Bo Liu, Haibin Huang, Chongyang Ma

Outline

This paper addresses the problem of "any-reference video generation," which synthesizes videos based on arbitrary types and combinations of reference objects and text prompts. To address issues such as identity mismatch, entanglement between multiple reference objects, and copy-paste artifacts, the authors propose a unified framework called MAGREF. MAGREF flexibly adapts to various reference images and text prompts using a mask-based guide and a target separation mechanism. The mask-based guide preserves the appearance characteristics of multiple objects through region-aware masking and pixel-wise channel concatenation, while the target separation mechanism injects the semantic value of each object derived from textual conditions into the corresponding visual region. Furthermore, a four-stage data pipeline is constructed to mitigate copy-paste artifacts. Extensive experiments demonstrate that MAGREF outperforms existing state-of-the-art techniques.

Takeaways, Limitations

Takeaways:
We present a novel framework for generating high-quality videos based on various reference images and text prompts.
An innovative methodology is proposed to address identity consistency, inter-object entanglement, and copy-paste artifacts (mask-based guidance, object separation mechanism, and four-stage data pipeline).
Demonstrated performance that surpasses existing cutting-edge technologies
Limitations:
There is no specific mention of Limitations in the paper.
👍