Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

UAD: Unsupervised Affordance Distillation for Generalization in Robotic Manipulation

Created by
  • Haebom

Author

Yihe Tang, Wenlong Huang, Yingke Wang, Chengshu Li, Roy Yuan, Ruohan Zhang, Jiajun Wu, Li Fei-Fei

Outline

This paper highlights the importance of understanding fine-grained object affordances for robotic object manipulation in unstructured environments. Existing visual feature prediction methods are limited by their reliance on manually annotated data or their limitations, which limit them to a predefined task set. In response, we present Unsupervised Affordance Distillation (UAD), a method that distills feature knowledge from a foundation model into a task-conditional feature model without any manual annotation. Leveraging the complementary strengths of large-scale vision models and vision-language models, UAD automatically annotates a large dataset of pairs. By training a lightweight task-conditional decoder on fixed features, UAD demonstrates remarkable generalization performance across real-world robotic environments and diverse human activities, despite being trained only on rendered objects in simulations. Using the features provided by UAD as the observation space, we propose an imitation learning policy that demonstrates promising generalization performance across unseen object instances, object categories, and variations in task instructions, even after training on only 10 exemplars.

Takeaways, Limitations

Takeaways:
Enable affordance learning by automatically annotating large datasets without manual annotation.
Utilizing the basic model to secure generalization performance to real environments using only simulation data.
Generalization performance to novel objects and task instructions is demonstrated with only a small number of exemplar learning trials.
It suggests applicability to real robot manipulation through combination with imitation learning policy.
Limitations:
There may be a domain gap problem with the real environment due to reliance on simulation data.
It depends on the performance of the base model, and the limitations of the base model may affect the performance of UAD.
Limitations of generalization performance across different objects and tasks require further study.
👍