The EPFL-Smart-Kitchen-30 dataset is a multi-view action dataset, collected over 29.7 hours of data from 16 subjects cooking four different recipes in a kitchen environment. Nine static RGB-D cameras, an inertial measurement unit (IMU), and a HoloLens 2 headset were used to capture 3D hand, body, and eye movements. The data is densely annotated with 33.78 action segments per minute and provides four benchmarks: a vision-to-speech benchmark, a semantic text-to-motion generation benchmark, a multimodal action recognition benchmark, and a pose-based action segmentation benchmark, supporting research in action understanding and modeling. The data and code are publicly available.