[공지사항]을 빙자한 안부와 근황 
Show more

Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Hi Robot: Open-Ended Instruction Following with Hierarchical Vision-Language-Action Models

Created by
  • Haebom

Author

Lucy Xiaoyang Shi, Brian Ichter, Michael Equi, Liyiming Ke, Karl Pertsch, Quan Vuong, James Tanner, Anna Walling, Haohuan Wang, Niccolo Fusai, Adrian Li-Bell, Danny Driess, Lachy Groom, Sergey Levine, Chelsea Finn

Outline

This paper presents a versatile robotic system that can perform a variety of tasks in an open environment. The system is capable of processing complex instructions, prompts, and feedback, and planning step-by-step tasks. It uses a hierarchical vision-language model to parse complex commands and user feedback, infer the most appropriate next step, and then performs that step as a low-level action. Unlike direct command execution that performs simple commands (“Pick up the cup”), the system can understand complex prompts and integrate context-sensitive feedback (“That’s not trash”) during task execution. The system’s ability to perform tasks such as table clearing, sandwich making, and grocery shopping is evaluated on three robotic platforms: a single-arm, dual-arm, and dual-arm mobile robot.

Takeaways, Limitations

Takeaways:
Demonstrates the potential to build robotic systems that can process complex language commands and contextual feedback.
Experimentally verified its applicability on various robot platforms.
Efficient task performance through hierarchical utilization of vision-language models.
Limitations:
Further analysis is needed on the generalizability and robustness of the system presented in the paper.
Scalability verification for various environments and tasks is required.
Further research is needed on the ability to handle unexpected situations or errors.
👍