This paper presents ApBot, a system for improving the ability of household robots to operate a variety of home appliances. ApBot is a robotic system that operates a new home appliance by “reading” the user manual. It faces the task of inferring a target conditional sub-policy from the unstructured text description of the user manual, applying it to the physical device, and reliably executing the policy over multiple steps despite accumulated errors. To address this challenge, ApBot utilizes a large-scale vision-language model (VLM) to construct a structured symbolic model of the device from the user manual, and visually applies symbolic actions to control panel elements. Finally, it closes the loop by updating the model based on visual feedback. Experimental results show that across a variety of simulated and real devices, ApBot achieves consistent and statistically significant improvements in task success rates compared to state-of-the-art large-scale VLMs that are directly used as control policies. These results suggest that structured internal representations play an important role in robotic operation of complex household devices in particular.