Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

Ctrl-World: A Controllable Generative World Model for Robot Manipulation

Created by
  • Haebom

Author

Yanjiang Guo, Lucy Xiaoyang Shi, Jianyu Chen, Chelsea Finn

Outline

To address the challenges of evaluating and improving the ability of general robot policies to handle new objects and instructions, we propose a controllable multi-view world model based on a trained dataset. This model utilizes a pose-conditional memory retrieval mechanism for long-term consistency and frame-level action conditioning for precise action control. Trained on the DROID dataset, this model generates spatially and temporally consistent trajectories for over 20 seconds in new scenarios and camera configurations. It accurately evaluates policy performance without requiring physical robot rollouts and synthesizes successful trajectories virtually, improving the policy success rate by 44.7% through supervised learning.

Takeaways, Limitations

Takeaways:
Developing a controllable world model that can evaluate the performance of general robotics policies without actual robot rollouts.
We present a method to improve policy performance by generating successful trajectories in a virtual environment.
Design a model structure that maintains long-term consistency and enables precise action control.
Limitations:
It relies on a specific dataset (DROID), and generalization performance on other datasets or environments requires further research.
Further evaluation of the model's long-term stability and applicability in complex environments is needed.
There may be limitations to improving the policy success rate, and further improvements are needed to achieve higher performance.
👍