Enabling humanoid robots to reason, plan, and act in complex open environments remains a challenging task. In this paper, we present Humanoid World Models (HWM), a lightweight open-source model that predicts future egocentric videos conditioned on the control tokens of a humanoid robot. Using 100 hours of humanoid robot demonstration data, we train two types of generative models, Masked Transformers and Flow-Matching, and explore architecture variations with different attention mechanisms and parameter sharing strategies. With parameter sharing, we reduce the model size by 33-53% with minimal impact on performance or visual fidelity. HWM is designed to be trained and deployed in practical academic and small-scale research environments, such as one or two GPUs.