[공지사항]을 빙자한 안부와 근황 
Show more

Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

DualReal: Adaptive Joint Training for Lossless Identity-Motion Fusion in Video Customization

Created by
  • Haebom

Author

Wenchuan Wang, Mengqi Huang, Yijing Tu, Zhendong Mao

Outline

This paper discusses recent research focusing on identity and motion consistency in personalized text-to-video generation using pre-trained large-scale models. Existing works follow an independent customization paradigm that exclusively customizes either identity or motion dynamics. However, this paradigm completely ignores the inherent mutual constraints and synergistic interdependencies between identity and motion, which introduces identity-motion conflicts throughout the generation process and systematically degrades performance. To address this, we present DualReal, a novel framework that uses adaptive joint training to jointly build inter-dimensional interdependencies. DualReal consists of two mechanisms: (1) Dual-aware Adaptation dynamically switches between training stages (i.e., identity or motion), learns current information guided by a fixed dimensional prior, and employs regularization strategies to prevent knowledge leakage; (2) StageBlender Controller adaptively guides different dimensions at a granularity to avoid conflicts at different stages and ultimately achieve lossless fusion of identity and motion patterns by leveraging the denoising step and the depth of the Diffusion Transformer. We construct a more comprehensive evaluation benchmark than existing methods. Our experiments show that DualReal improves CLIP-I and DINO-I metrics by an average of 21.7% and 31.8%, respectively, and achieves the best performance in almost all motion metrics.

Takeaways, Limitations

Takeaways:
We present DualReal, an adaptive joint training framework that considers the interdependence of identity and movement.
Achieved improved CLIP-I and DINO-I indices compared to existing methods (average improvement of 21.7% and 31.8%, respectively).
Achieve top performance in almost all movement metrics.
Build more comprehensive evaluation benchmarks.
Limitations:
DualReal's performance improvements may be limited to specific datasets or models.
Complexity and computational cost of the training process.
Further research is needed on generalization performance in real-world applications.
👍