Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

DisCoRD: Discrete Tokens to Continuous Motion via Rectified Flow Decoding

Created by
  • Haebom

Author

Jungbin Cho, Junwan Kim, Jisoo Kim, Minseo Kim, Mingu Kang, Sungeun Hong, Tae-Hyun Oh, Youngjae Yu

Outline

This paper presents a novel method, Discrete Tokens to Continuous Motion via Rectified Flow Decoding (DisCoRD). Designed to address the differences between discrete and continuous motion representations, it utilizes rectified flow to decode discrete motion tokens into a continuous raw motion space. To address the limited expressiveness and frame-level noise artifacts of existing discrete generation methods, as well as the difficulty of continuous approaches in complying with conditional signals, we structure token decoding as a conditional generation task to capture subtle motions and generate smoother, more natural motion. We enhance naturalness while maintaining fidelity to conditional signals across a variety of settings, achieving state-of-the-art performance (FID 0.032 and 0.169, respectively) on the HumanML3D and KIT-ML datasets.

Takeaways, Limitations

Takeaways:
It sets a new standard for human motion generation by combining the efficiency of discrete representations with the realism of continuous representations.
Token decoding techniques using rectified flow are compatible with various discrete-based frameworks.
We demonstrate the superiority of our method by achieving state-of-the-art performance on the HumanML3D and KIT-ML datasets.
It allows for more natural and smooth motion creation.
Limitations:
Further research is needed to evaluate the generalization performance of the method presented in this paper.
Performance evaluation on other motion datasets is required.
The computational cost of the stationary flow can be high.
Further validation of its applicability to very large datasets is needed.
👍