Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

AnchorDP3: 3D Affordance Guided Sparse Diffusion Policy for Robotic Manipulation

Created by
  • Haebom

Author

Ziyan Zhao, Ke Fan, He-Yang Xu, Ning Qiao, Bo Peng, Wenlong Gao, Dongjiang Li, Hui Shen

Outline

AnchorDP3 is a diffusion policy framework for dual-arm robotic manipulation that achieves state-of-the-art performance in highly random environments. It integrates three key innovations: (1) simulator-supervised semantic segmentation (provides robust condition priors by explicitly segmenting task-critical objects within the point cloud using rendered correct answers), (2) task-condition feature encoder (a lightweight module that processes augmented point clouds per task, enabling efficient multi-task learning via shared diffusion-based motion experts), and (3) condition-anchored key pose diffusion (dramatically simplifies the prediction space by replacing dense trajectory predictions with sparse, geometrically meaningful motion anchors, such as pre-grasp poses and grip poses anchored directly to the context); the motion experts are forced to predict robot joint angles and end-effector poses simultaneously, accelerating convergence and improving accuracy by leveraging geometric consistency. Trained on large-scale procedurally generated simulation data, AnchorDP3 achieves an average success rate of 98.7% on the RoboTwin benchmark across a wide variety of tasks under extreme randomization of objects, clutter, table height, illumination, and background. Integrated with the RoboTwin real-simulation pipeline, this framework has the potential to generate fully autonomous, deployable visual-motor policies based solely on scenes and instructions, completely eliminating human demonstration in manipulation skill learning.

Takeaways, Limitations

Takeaways:
Achieving state-of-the-art performance for dual-arm robot manipulation in highly random environments.
Efficient and accurate multi-task learning via simulator-supervised semantic segmentation, task-condition feature encoders, and condition-invariant key pose diffusion.
Presenting the possibility of learning robot manipulation skills through visual and instructional methods without human demonstration.
Achieved very high success rate (98.7%) on RoboTwin benchmark.
Limitations:
Currently, it is dependent on the simulation environment and additional verification is required for application to a real environment.
It is expected that the computing resources required for learning large-scale simulation data will be significant.
The possibility that the complexity and unpredictability of real-world environments may not be fully taken into account.
RoboTwin is dependent on the performance of the real-simulation pipeline, and limitations of the pipeline may impact the performance of AnchorDP3.
👍