Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Cut2Next: Generating Next Shot via In-Context Tuning

Created by
  • Haebom

Author

Jingwen He, Hongbo Liu, Jiajun Li, Ziqi Huang, Yu Qiao, Wanli Ouyang, Ziwei Liu

Outline

This paper emphasizes the importance of cinematic continuity and editing patterns in multi-shot generation and presents Cut2Next, a novel framework that overcomes the limitations of existing methods. Cut2Next generates the next shot using a hierarchical multi-prompting strategy based on the Diffusion Transformer (DiT). Hierarchical multi-prompting utilizes relational and individual prompts to specify the overall context, editing style between shots, and the content and cinematic properties of each shot. Structural innovations such as Context-Aware Condition Injection (CACI) and Hierarchical Attention Mask (HAM) integrate various cues without adding parameters. We build a large-scale RawCuts dataset and a refined CuratedCuts dataset, and present CutBench for evaluation. Experimental results demonstrate that Cut2Next performs well in visual consistency and text fidelity. Specifically, user studies have confirmed a strong preference for adherence to intended editing patterns and cinematic continuity, validating its ability to generate high-quality, narratively consistent next shots.

Takeaways, Limitations

Takeaways:
Presenting new possibilities for multi-shot creation that take cinematic continuity and editing patterns into account.
Effective use of Diffusion Transformer and hierarchical multi-prompting strategies
Laying the foundation for future research by presenting large-scale datasets and evaluation criteria.
Ensuring the reliability of subjective quality assessments through user research.
Limitations:
Further review of the size and diversity of the presented dataset is needed.
Need to verify generalization performance for various genres and styles of movies
Consideration needs to be given to computational costs and processing times.
Further research is needed on the applicability of this method in real-world film production environments.
👍