To address the limitations of video diffusion models, this paper proposes the WorldForge framework, which can be applied at inference time without training. WorldForge consists of three modules and injects precise trajectory guidance, enabling accurate motion control and realistic content generation. This framework can be applied to a wide range of 3D/4D tasks and outperforms existing methods in trajectory compliance, geometric consistency, and perceptual quality.