Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

FLOAT: Generative Motion Latent Flow Matching for Audio-driven Talking Portrait

Created by
  • Haebom

Author

Taekyung Ki, Dongchan Min, Gyeongsu Chae

Outline

This paper points out that despite the advancement of diffusion-based generative models, portrait video animations still struggle with temporally coherent video generation and fast sampling due to repetitive sampling. In this paper, we present FLOAT, an audio-based interactive portrait video generation method based on a flow-consistent generative model. It utilizes learned orthogonal motion latent spaces instead of pixel-based latent spaces to enable efficient generation and editing of temporally coherent motions. To this end, we introduce a transformer-based vector field estimator with an effective frame-wise conditioning mechanism, and support speech-based emotion reinforcement to naturally integrate expressive motions. Through extensive experiments, we demonstrate that the proposed method outperforms state-of-the-art audio-based interactive portrait methods in terms of visual quality, motion fidelity, and efficiency.

Takeaways, Limitations

Takeaways:
We enable temporally consistent and efficient audio-based interactive portrait video generation by leveraging a flow-matched generative model and orthogonal motion latent space.
Natural motion generation and editing possible with transformer-based vector field estimator and frame-by-frame conditioning mechanism.
Create expressive movements with voice-based emotion enhancement.
Improved visual quality, motion fidelity, and efficiency over existing methods.
Limitations:
The paper lacks specific references to Limitations or future research directions.
It is unclear whether there is a dependency on a specific dataset or hardware environment.
Lack of discussion of potential problems or limitations that may arise in practical application.
👍