Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

DreamActor-H1: High-Fidelity Human-Product Demonstration Video Generation via Motion-designed Diffusion Transformers

Created by
  • Haebom

Author

Lizhen Wang, Zhurong Xia, Tianshu Hu, Pengrui Wang, Pengfei Wei, Zerong Zheng, Ming Zhou, Yuan Zhang, Mingyuan Gao

Outline

This paper highlights the importance of generating high-quality human-product demonstration videos for effective product promotion in e-commerce and digital marketing. Existing frameworks suffer from the problem of failing to maintain both human and product identities or lacking an understanding of human-product spatial relationships, resulting in unrealistic representations and unnatural interactions. To address this, this paper proposes a Diffusion Transformer (DiT)-based framework. The proposed method injects pairwise human-product reference information and leverages an additional masked cross-attention mechanism to simultaneously preserve human identity and product details such as logos and textures. Using 3D body mesh templates and product bounding boxes, it provides accurate motion guidance, intuitively aligning hand gestures with product placement. Furthermore, it incorporates category-level semantics using structured text encoding to enhance 3D consistency during small rotational changes between frames. Trained on a hybrid dataset using extensive data augmentation strategies, our approach outperforms the state-of-the-art in maintaining the integrity of human and product identities and generating realistic demonstration motions.

Takeaways, Limitations

Takeaways:
Ability to create high-quality human-product demonstration videos that simultaneously maintain human and product identities.
Implement natural interactions by providing accurate motion guidance using 3D body mesh templates and product bounding boxes.
Improving 3D consistency by integrating category-level semantics through structured text encoding.
Improving Performance Through Data Augmentation Strategies
Excellent performance compared to cutting-edge technology
Limitations:
Further evaluation of the generalization performance of the proposed method is needed.
Applicability validation is needed for diverse product categories and complex interactions.
Limitations on the size and diversity of the dataset used
Consideration needs to be given to computational costs and processing times.
👍