Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

MOGO: Residual Quantized Hierarchical Causal Transformer for High-Quality and Real-Time 3D Human Motion Generation

Created by
  • Haebom

Author

Dongjie Fu, Tengjiao Sun, Pengcheng Fang, Xiaohao Cai, Hansung Kim

Outline

In this paper, we propose MOGO (Motion Generation with One-pass), a novel autoregressive framework for efficient, real-time 3D motion generation. MOGO consists of two main components. First, the Motion Scale-Adaptive Residual Vector Quantization (MoSA-VQ) module, which hierarchically discretizes motion sequences using learnable scaling to generate concise yet expressive representations. Second, the Residual Quantized Hierarchical Causal Transformer (RQHC-Transformer), which generates multi-layer motion tokens in a single forward pass, significantly reducing inference latency. We further enhance text-controlled motion decoding by adding a text-conditional alignment mechanism. Extensive experiments on benchmark datasets such as HumanML3D, KIT-ML, and CMP demonstrate that MOGO achieves generation quality that is competitive or superior to state-of-the-art Transformer-based methods, while offering significant improvements in real-time performance, streaming generation, and generalization in zero-shot settings.

Takeaways, Limitations

Takeaways:
Proposing MOGO, a novel framework for efficient, real-time 3D motion generation.
Generate concise and expressive motion representations with the MoSA-VQ module.
Reducing multilayer motion token generation and inference latency in a single forward pass using RQHC-Transformer.
Improved motion decoding under text control through a text conditional alignment mechanism.
Achieve competitive generation quality and improved real-time performance, streaming generation, and zero-shot performance compared to state-of-the-art methods.
Limitations:
The paper does not specifically mention Limitations. Further experiments and analyses are needed to elucidate Limitations.
MOGO's performance may be biased on certain datasets. Further experiments on diverse datasets are needed.
Quantitative analysis of real-time performance improvements may be lacking. More detailed performance analysis is needed.
👍