Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

AntiGrounding: Lifting Robotic Actions into VLM Representation Space for Decision Making

Created by
  • Haebom

Author

Wenbo Li, Shiyi Wang, Yiteng Chen, Huiping Zhuang, Qingyao Wu

Outline

In robotic manipulation using Vision-Language Models (VLMs), existing methods suffer from the problem of reducing information to intermediate representations that lose important details. To address this issue, we propose a novel framework, called AntiGrounding. AntiGrounding directly elevates candidate actions into the VLM representation space, renders trajectories from multiple viewpoints, and performs command-based decision-making via structured visual question answering. This enables zero-shot synthesis of optimal closed-loop robot trajectories for new tasks. In addition, we propose an offline policy improvement module that leverages past experience to improve long-term performance. Simulation and real-world experimental results demonstrate that the proposed method outperforms existing methods on a variety of robotic manipulation tasks.

Takeaways, Limitations

Takeaways:
By directly utilizing the high-dimensional representation space of VLM, the accuracy and efficiency of robot manipulation were improved.
Zero-shot learning improves adaptability to new tasks.
Long-term performance improvements were achieved through offline policy improvement modules.
It has demonstrated excellent performance in both simulation and real-world environments.
Limitations:
Further studies are needed to investigate the generalization performance of the proposed method.
There is a lack of performance evaluation in complex and unpredictable environments.
The learning efficiency of the offline policy improvement module may need to be improved.
👍