Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

AC-DiT: Adaptive Coordination Diffusion Transformer for Mobile Manipulation

Created by
  • Haebom

Author

Sixiang Chen, Jiaming Liu, Siyuan Qian, Han Jiang, Lily Li, Renrui Zhang, Zhuoyang Liu, Chenyang Gu, Chengkai Hou, Pengwei Wang, Zhongyuan Wang, Shanghang Zhang

Outline

This paper addresses mobile manipulation to enable language-conditional robot control in household tasks. Existing methods struggle with the coordination of the mobile base and the manipulator because they do not explicitly model the influence of the mobile base and do not consider the perceptual requirements of various modalities. In this paper, we propose an adaptive coordination diffusion transformer (AC-DiT) that improves the coordination of the mobile base and the manipulator. Considering that the movement of the mobile base directly affects the manipulator’s actions, AC-DiT introduces a mobility-body conditioning mechanism to extract base motion representations as contextual prior information for predicting overall body actions. In addition, we design a perceptually-aware multimodal conditioning strategy that dynamically adjusts the fusion weights between various 2D images and 3D point clouds to meet the perceptual requirements at different stages of mobile manipulation. We verify the performance of AC-DiT through extensive experiments in both simulation and real environments.

Takeaways, Limitations

Takeaways:
Improves the accuracy and stability of mobile operations by explicitly considering the movement of the mobile base.
Adaptively utilizes required modality information at various stages to improve perception efficiency.
The practicality of the proposed method was verified through experiments in a real environment.
It presents a new architecture for end-to-end mobile operation.
Limitations:
Further studies are needed on the generality of the proposed method (applicability to various environments and tasks).
There is a lack of evaluation of real-time processing speed.
Further validation of robot control performance in complex environments is needed.
👍