Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

WorldGym: World Model as An Environment for Policy Evaluation

Created by
  • Haebom

Author

Julian Quevedo, Ansh Kumar Sharma, Yixiang Sun, Varad Suryavanshi, Percy Liang, Sherry Yang

Outline

To address the challenges of evaluating robot control policies, the authors propose WorldGym, an autoregressive, action-conditioned video generation model that acts as a proxy for real-world environments. WorldGym evaluates policies through Monte Carlo rollout, with a vision-language model providing rewards. Using only the initial frames of real robots, they evaluate WorldGym on a set of VLA-based real-world robot policies and demonstrate that the policy success rates within WorldGym are highly correlated with the actual success rates. Furthermore, they demonstrate that WorldGym maintains relative policy rankings across different policy versions, sizes, and training checkpoints. Because WorldGym requires only a single starting frame, it efficiently evaluates the generalization ability of robot policies to new tasks and environments.

Takeaways, Limitations

Takeaways:
WorldGym provides a practical starting point for safe and reproducible evaluation of real-world robotic policies.
WorldGym is effective in assessing the generalizability of policies.
WorldGym's policy success rates are highly correlated with actual success rates.
WorldGym can maintain relative policy rankings across different policy versions.
Limitations:
State-of-the-art VLA-based robotic policies still struggle to distinguish object shapes.
May be hampered by the hostile appearance of objects.
Creating highly realistic object interactions remains challenging.
👍