This page organizes papers related to artificial intelligence published around the world. This page is summarized using Google Gemini and is operated on a non-profit basis. The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.
MimicDreamer: Aligning Human and Robot Demonstrations for Scalable VLA Training
Created by
Haebom
Author
Haoyun Li, Ivan Zhang, Runqi Ouyang, Xiaofeng Wang, Zheng Zhu, Zhiqin Yang, Zhentao Zhang, Boyuan Wang, Chaojun Ni, Wenkang Qin, Xinze Chen, Yun Ye, Guan Huang, Zhenbo Song, Xingang Wang
Outline
This paper proposes MimicDreamer, a novel framework for training Vision Language Action (VLA) models by leveraging readily available human demonstration videos instead of collecting costly robot interaction data. MimicDreamer supports VLA model training by aligning vision, viewpoint, and action data to transform human demonstration videos into a robot-usable format. Specifically, H2R Aligner generates robot demonstration videos from human demonstration videos, EgoStabilizer stabilizes viewpoints, and action alignment maps human hand trajectories to robot frames to generate robot joint commands. Experimental results show that a VLA model trained using synthetic data generated through MimicDreamer can perform tasks on a real robot within a small number of trials, outperforming models trained solely on real robot data.
Takeaways, Limitations
•
Takeaways:
◦
Leveraging human demonstration videos can reduce the cost of acquiring data for robot training.
◦
Effectively bridge the gap between human demonstration videos and robot environments with our new framework, MimicDreamer.
◦
Leveraging synthetic data to improve the performance of VLA models, achieving better results than when using only real robot data.
•
Limitations:
◦
Because we rely on human demonstration data to improve performance, results may vary depending on the quality of the human demonstration data.
◦
The performance of the overall framework may be limited by the performance of the H2R Aligner, EgoStabilizer, and Action Alignment modules.
◦
Only experimental results for six representative manipulation tasks are presented, so further verification of generalization performance for various robot environments and tasks is required.