Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

ManiAgent: An Agentic Framework for General Robotic Manipulation

Created by
  • Haebom

Author

Yi Yang, Kefan Gu, Yuqing Wen, Hebei Li, Yucheng Zhao, Tiancai Wang, Xudong Liu

Outline

To address the complex inference and long-term task planning challenges of Vision-Language-Action (VLA) models, we propose ManiAgent, an agent-based architecture that converts task descriptions and environmental inputs into robot manipulation actions end-to-end. This architecture efficiently handles complex manipulation scenarios by leveraging inter-agent communication for environmental perception, subtask decomposition, and action generation. It achieves an 86.8% success rate on the SimplerEnv benchmark and a 95.8% success rate on a real-world pick-and-place task, enabling efficient data collection for VLA models that perform similarly to models trained on human-annotated datasets.

Takeaways, Limitations

Takeaways:
ManiAgent demonstrates outstanding performance in complex robotic manipulation tasks.
Efficiently handle complex tasks through inter-agent communication.
It enables efficient data collection for VLA models that perform similarly to human annotated datasets.
Limitations:
There is no specific mention of Limitations in the paper.
👍