This is a page that curates AI-related papers published worldwide. All content here is summarized using Google Gemini and operated on a non-profit basis. Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.
This paper highlights the importance of generating high-quality human-product demonstration videos for effective product promotion in e-commerce and digital marketing. Existing frameworks suffer from the problem of failing to maintain both human and product identities or lacking an understanding of human-product spatial relationships, resulting in unrealistic representations and unnatural interactions. To address this, this paper proposes a Diffusion Transformer (DiT)-based framework. The proposed method injects pairwise human-product reference information and leverages an additional masked cross-attention mechanism to simultaneously preserve human identity and product details such as logos and textures. Using 3D body mesh templates and product bounding boxes, it provides accurate motion guidance, intuitively aligning hand gestures with product placement. Furthermore, it incorporates category-level semantics using structured text encoding to enhance 3D consistency during small rotational changes between frames. Trained on a hybrid dataset using extensive data augmentation strategies, our approach outperforms the state-of-the-art in maintaining the integrity of human and product identities and generating realistic demonstration motions.
Takeaways, Limitations
•
Takeaways:
◦
Ability to create high-quality human-product demonstration videos that simultaneously maintain human and product identities.
◦
Implement natural interactions by providing accurate motion guidance using 3D body mesh templates and product bounding boxes.
◦
Improving 3D consistency by integrating category-level semantics through structured text encoding.
◦
Improving Performance Through Data Augmentation Strategies
◦
Excellent performance compared to cutting-edge technology
•
Limitations:
◦
Further evaluation of the generalization performance of the proposed method is needed.
◦
Applicability validation is needed for diverse product categories and complex interactions.
◦
Limitations on the size and diversity of the dataset used
◦
Consideration needs to be given to computational costs and processing times.