Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Exploring Object Status Recognition for Recipe Progress Tracking in Non-Visual Cooking

Created by
  • Haebom

Author

Franklin Mingzhe Li, Kaitlyn Ng, Bin Zhu, Patrick Carrington

Outline

In this paper, we propose OSCAR (Object Status Context Awareness for Recipes), a cooking process tracking technique utilizing Object Status Recognition, with the goal of developing a cooking assistance system for the visually impaired. OSCAR supports real-time cooking step tracking by integrating recipe parsing, object state extraction, visual alignment with cooking steps, and temporal causal modeling. We evaluate OSCAR using 173 cooking videos and a real cooking dataset recorded in the homes of 12 visually impaired people. We find that object status recognition improves the step prediction accuracy of a visual language model. We also analyze the impact of real-world factors such as implicit tasks, camera placement, and lighting on the performance. This paper provides a context-aware cooking process tracking pipeline, an annotated real-world non-visual cooking dataset, and design insights for future context-aware cooking assistance systems.

Takeaways, Limitations

Takeaways:
Presenting a new technological pipeline (OSCAR) to support independent cooking activities of visually impaired people.
Suggesting the possibility of improving the accuracy of cooking process tracking by utilizing object state recognition.
Building and releasing a non-visual cooking dataset in real-world environments.
Analysis of the impact of real-world environmental conditions (lighting, camera position, etc.) on system performance.
Limitations:
The size of the experimental dataset may be relatively small.
Need to verify generalization performance for implicit tasks or various cooking environments.
Long-term usability evaluations with actual visually impaired users may be lacking.
👍