Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

Multi-Modal Manipulation via Multi-Modal Policy Consensus

Created by
  • Haebom

Author

Haonan Chen, Jiaming Xu, Hongyu Chen, Kaiwen Hong, Binghao Huang, Chaoqi Liu, Jiayuan Mao, Yunzhu Li, Yilun Du, Katherine Driggs-Campbell

Outline

Effectively integrating diverse sensory information is crucial for robotic manipulation. However, existing feature linking approaches have the problem that dominant sensory information, such as vision, overwhelms tactile information, which is crucial for touch-related tasks. This paper proposes a method that decomposes the policy into a set of diffusion models specialized for each representation (e.g., vision or tactile) and adaptively combines the contributions of each model using a router network. This allows for the gradual integration of new representations. We demonstrate that our approach outperforms baseline feature linking approaches on simulated RLBench tasks and real-world tasks such as occluded object grasping, spoon reordering in the hand, and puzzle insertion. We also demonstrate robustness to physical disturbances and sensor damage, and perform an importance analysis demonstrating adaptive shifting between sensory inputs.

Takeaways, Limitations

Takeaways:
A novel approach to effectively integrate diverse sensory information (based on diffusion model and router network).
Progressive integration of new sensory information is possible.
Demonstrated excellent performance in simulations and real-world environments.
Robustness against physical disturbances and sensor damage.
Performing importance analysis showing adaptive shifts between sensory information.
Limitations:
The specific Limitations is not stated in the abstract. (For details, please refer to the paper directly.)
👍