This page organizes papers related to artificial intelligence published around the world. This page is summarized using Google Gemini and is operated on a non-profit basis. The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.
This paper proposes AsynFusion, a novel framework for audio-based avatar pose and expression generation. AsynFusion utilizes a diffusion transformer to harmoniously synthesize facial expressions and gestures, and enables parallel generation based on a dual-branch DiT architecture. The Cooperative Synchronization Module facilitates interaction between the two modalities, and the Asynchronous LCM Sampling strategy reduces computational burden while maintaining high-quality output. Experimental results demonstrate that AsynFusion outperforms existing methods in real-time, synchronized full-body animation generation.
Takeaways, Limitations
•
Takeaways:
◦
Create more natural animations through seamless adjustment of facial expressions and gestures.
◦
Introducing an efficient sampling strategy for real-time performance.
◦
Demonstrated superior performance over existing methods.
◦
It presents potential applications in various fields such as virtual reality, digital entertainment, and remote communication.
•
Limitations:
◦
The specific Limitations is not specified in the paper.
◦
(Assumption) Further research is needed regarding model complexity, training data dependency, and potential performance degradation in certain environments.