Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Robot Operation of Home Appliances by Reading User Manuals

Created by
  • Haebom

Author

Jian Zhang, Hanbo Zhang, Anxing Xiao, David Hsu

Outline

This paper presents ApBot, a system for improving the ability of household robots to operate a variety of home appliances. ApBot is a robotic system that operates a new home appliance by “reading” the user manual. It faces the task of inferring a target conditional sub-policy from the unstructured text description of the user manual, applying it to the physical device, and reliably executing the policy over multiple steps despite accumulated errors. To address this challenge, ApBot utilizes a large-scale vision-language model (VLM) to construct a structured symbolic model of the device from the user manual, and visually applies symbolic actions to control panel elements. Finally, it closes the loop by updating the model based on visual feedback. Experimental results show that across a variety of simulated and real devices, ApBot achieves consistent and statistically significant improvements in task success rates compared to state-of-the-art large-scale VLMs that are directly used as control policies. These results suggest that structured internal representations play an important role in robotic operation of complex household devices in particular.

Takeaways, Limitations

Takeaways:
Demonstrates the potential of robotic systems to understand user manuals and operate home appliances.
We demonstrate the effectiveness of our approach of generating structured symbolic models by leveraging large-scale vision-language models.
Presenting the possibility of stable task performance through loop closure based on visual feedback.
Emphasizes the importance of structured internal representation in the operation of complex home appliances.
Limitations:
Reliance on the accuracy of interpretation of the user manual.
Limitations of generalization performance across different appliance types and user manual formats.
Limited ability to cope with the unpredictability and errors of the real environment.
Potential performance degradation due to differences between simulation and real environments.
👍