To address the challenges of evaluating and improving the ability of general robot policies to handle new objects and instructions, we propose a controllable multi-view world model based on a trained dataset. This model utilizes a pose-conditional memory retrieval mechanism for long-term consistency and frame-level action conditioning for precise action control. Trained on the DROID dataset, this model generates spatially and temporally consistent trajectories for over 20 seconds in new scenarios and camera configurations. It accurately evaluates policy performance without requiring physical robot rollouts and synthesizes successful trajectories virtually, improving the policy success rate by 44.7% through supervised learning.