Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Humanoid World Models: Open World Foundation Models for Humanoid Robotics

Created by
  • Haebom

Author

Muhammad Qasim Ali, Aditya Sridhar, Shahbuland Matiana, Alex Wong, Mohammad Al-Sharman

Outline

Enabling humanoid robots to reason, plan, and act in complex open environments remains a challenging task. In this paper, we present Humanoid World Models (HWM), a lightweight open-source model that predicts future egocentric videos conditioned on the control tokens of a humanoid robot. Using 100 hours of humanoid robot demonstration data, we train two types of generative models, Masked Transformers and Flow-Matching, and explore architecture variations with different attention mechanisms and parameter sharing strategies. With parameter sharing, we reduce the model size by 33-53% with minimal impact on performance or visual fidelity. HWM is designed to be trained and deployed in practical academic and small-scale research environments, such as one or two GPUs.

Takeaways, Limitations

Takeaways:
Provides a lightweight, open-source humanoid robot world model (HWM) for increased accessibility to academia and small research labs.
We propose a method to effectively reduce the model size using parameter sharing techniques.
We evaluated the performance by comparing and analyzing two generative models: Masked Transformers and Flow-Matching.
We demonstrate the potential of the world model as a dynamic model for long-term planning and policy learning.
Limitations:
Given that it was trained with 100 hours of data, further research utilizing more diverse and massive datasets is needed.
Currently, we focus on egocentric video prediction, and further research is needed on integration with other sensor modalities (e.g., force sensors, joint angles).
Since experimental results in a real robotic environment are not presented, further verification of practical applicability is required.
Further evaluation of generalization performance across different environments and tasks is needed.
👍