[공지사항]을 빙자한 안부와 근황 
Show more

Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

EgoVLA: Learning Vision-Language-Action Models from Egocentric Human Videos

Created by
  • Haebom

Author

Ruihan Yang, Qinxi Yu, Yecheng Wu, Rui Yan, Borui Li, An-Chieh Cheng, Xueyan Zou, Yunhao Fang, Xuxin Cheng, Ri-Zhao Qiu, Hongxu Yin, Sifei Liu, Song Han, Yao Lu, Xiaolong Wang

Outline

In this paper, we propose a Vision-Language-Action (VLA) model training method using video data captured from a human point of view to overcome the scale limitation of collecting real robot data in imitation learning for robot manipulation. We train a VLA model by utilizing rich scene and task information of human video data, and convert human actions into robot actions through inverse kinematics and target retargeting. We fine-tune the model using a small number of robot manipulation demonstrations to obtain a robot policy called EgoVLA, and evaluate EgoVLA on a simulation benchmark called Ego Humanoid Manipulation Benchmark that includes various bimanual manipulation tasks. As a result, we demonstrate improved performance compared to existing methods, proving the importance of human data.

Takeaways, Limitations

Takeaways:
Presenting a strategy for utilizing large-scale human video data that can overcome the limitations of collecting real robot data
Presenting an effective method to convert human behavior into robot behavior (inverse kinematics and target retargeting)
A new simulation benchmark (Ego Humanoid Manipulation Benchmark) covering a variety of bimanual manipulation tasks is presented.
Improved performance over existing methods and demonstrated the importance of human data
Limitations:
Since the evaluation results are from a simulation environment, performance in an actual robot environment requires additional verification.
There is a possibility of performance degradation due to differences between human and robot behavior.
Further research is needed on the generalizability of the Ego Humanoid Manipulation Benchmark
👍