Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

Do You Need Proprioceptive States in Visuomotor Policies?

Created by
  • Haebom

Author

Juntu Zhao, Wenbo Lu, Di Zhang, Yufeng Liu, Yushen Liang, Tianluo Zhang, Yifeng Cao, Junyuan Xie, Yingdong Hu, Shengjie Wang, Junliang Guo, Dequan Wang, Yang Gao

Outline

This paper highlights the limitations of imitation learning-based visuomotor policies that utilize both visual and proprioceptive state information in robotic manipulation. Existing approaches experimentally demonstrate that they rely excessively on proprioceptive state information, leading to overfitting to training data and poor spatial generalization. Therefore, we propose a "state-free policy" that eliminates proprioceptive state information and predicts actions based solely on visual information. This policy is built on the relative hand effector action space and receives complete task-relevant visual information from dual wide-angle wrist cameras. Experimental results demonstrate that the state-free policy significantly improves spatial generalization performance (from 0% to 85% height generalization and from 6% to 64% horizontal generalization) over state-based policies across a variety of robotic implementations and tasks, including pick-and-place, shirt folding, and complex full-body manipulation. Furthermore, it demonstrates advantages in data efficiency and adaptability across implementations, enhancing its practicality for real-world deployment.

Takeaways, Limitations

Takeaways:
We reveal that excessive reliance on proprioceptive state information hinders the spatial generalization performance of robot manipulation.
We demonstrate that a stateless policy that uses only visual information significantly improves spatial generalization, data efficiency, and adaptability across implementations.
Experimentally verified performance improvements across various robotic tasks (pick-and-place, shirt folding, full-body manipulation, etc.) and implementations.
Presenting practical policies for real-world robotic manipulation applications.
Limitations:
May depend on specific visual information acquisition methods, such as dual wide-angle wrist cameras.
Further research is needed to determine whether the proposed policy's generalization performance can be equally applied to all types of robot manipulation tasks.
Further verification of robustness to various environmental changes is needed.
👍