In this paper, we propose OCK, a novel model that leverages object-centric kinematics for dynamic video prediction. While existing object-centric transformer models mainly focus on the appearance of objects, OCK explicitly models not only the appearance information such as size, shape, and color of objects but also their kinematic information such as position, velocity, and acceleration. This is important for modeling dynamic interactions between objects and maintaining temporal consistency in complex environments. OCK enables spatiotemporal prediction of complex object interactions over long video sequences by introducing an object kinematics component integrated with object slots. It shows excellent performance in scenes containing complex object properties and motions, and shows potential for application to vision-related dynamic learning tasks.