Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

EPFL-Smart-Kitchen-30: Densely annotated cooking dataset with 3D kinematics to challenge video and language models

Created by
  • Haebom

Author

Andy Bonnetto, Haozhe Qi, Franklin Leong, Matea Tashkovska, Mahdi Rad, Solaiman Shokur, Friedhelm Hummel, Silvestro Micera, Marc Pollefeys, Alexander Mathis

Outline

The EPFL-Smart-Kitchen-30 dataset is a multi-view action dataset, collected over 29.7 hours of data from 16 subjects cooking four different recipes in a kitchen environment. Nine static RGB-D cameras, an inertial measurement unit (IMU), and a HoloLens 2 headset were used to capture 3D hand, body, and eye movements. The data is densely annotated with 33.78 action segments per minute and provides four benchmarks: a vision-to-speech benchmark, a semantic text-to-motion generation benchmark, a multimodal action recognition benchmark, and a pose-based action segmentation benchmark, supporting research in action understanding and modeling. The data and code are publicly available.

Takeaways, Limitations

Takeaways:
Contributing to the advancement of action understanding research by providing a rich dataset that includes various modalities (RGB-D, IMU, Eye Gaze, Body & Hand Kinematics).
Provides four benchmarks to enable diverse behavioral analysis and modeling studies.
Data collected in an environment similar to real life (the kitchen) enables ecologically valid behavioral research.
Reproducibility and sharing of research with open datasets and code.
Limitations:
The size of the dataset (16 subjects) may be relatively small.
Generalizability may be limited as the dataset is limited to specific cooking activities.
The data collection environment was limited to a specific kitchen, requiring verification of generalization performance in other environments.
👍