Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Embodied Long Horizon Manipulation with Closed-loop Code Generation and Incremental Few-shot Adaptation

Created by
  • Haebom

Author

Yuan Meng, Xiangtong Yao, Haihui Ye, Yirui Zhou, Shengqiang Zhang, Zhenguo Sun, Xukun Li, Zhenshan Bing, Alois Knoll

Outline

This paper presents a novel robotic control framework for long-duration object manipulation. Given that existing learning-based approaches rely on large, task-specific datasets and struggle to generalize to unknown scenarios, this study proposes a closed-loop framework that utilizes a large-scale language model (LLM) to generate directly executable code plans, rather than relying on pre-trained low-level controllers. The LLM generates robust and generalizable task plans through a few iterations of learning guided by the Course of Thought (CoT) and progressively structured examples. A reporter using RGB-D evaluates the results and provides structured feedback, enabling error correction and replanning under partial observation. This eliminates step-by-step inference, reduces computational overhead, and limits error accumulation observed in previous methods. It achieves state-of-the-art performance on over 30 diverse long-duration tasks, both known and unknown, in cluttered real-world environments, including LoHoRavens, CALVIN, Franka Kitchen, and others.

Takeaways, Limitations

Takeaways:
We present a novel solution to the long-term object manipulation problem by leveraging large-scale language models to generate directly executable code without a low-level controller.
Generate robust, generalizable action plans through guided thinking processes (CoT) and progressive, structured example learning.
A closed-loop framework and RGB-D-based feedback system enable error correction and replanning, reducing step-by-step inference and error accumulation.
Achieve cutting-edge performance for over 30 tasks in a variety of environments.
Limitations:
It is dependent on the performance of LLM, and the limitations of LLM can directly affect the system performance.
Since there are parts that depend on the RGB-D sensor, system operation may be affected if the sensor is degraded or unavailable.
Generalization performance in real-world environments requires further experimentation and validation.
The computational cost of LLM can be significant, and further research into its real-time performance is needed.
👍