[공지사항]을 빙자한 안부와 근황 
Show more

Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

OCK: Unsupervised Dynamic Video Prediction with Object-Centric Kinematics

Created by
  • Haebom

Author

Yeon-Ji Song, Jaein Kim, Suhyung Choi, Jin-Hwa Kim, Byoung-Tak Zhang

Outline

In this paper, we propose OCK, a novel model that leverages object-centric kinematics for dynamic video prediction. While existing object-centric transformer models mainly focus on the appearance of objects, OCK explicitly models not only the appearance information such as size, shape, and color of objects but also their kinematic information such as position, velocity, and acceleration. This is important for modeling dynamic interactions between objects and maintaining temporal consistency in complex environments. OCK enables spatiotemporal prediction of complex object interactions over long video sequences by introducing an object kinematics component integrated with object slots. It shows excellent performance in scenes containing complex object properties and motions, and shows potential for application to vision-related dynamic learning tasks.

Takeaways, Limitations

Takeaways:
Improving dynamic video prediction performance using object-centric kinematics.
Effective modeling of complex object interactions and long-run sequences.
A novel approach to visual dynamic learning tasks.
Limitations:
Further evaluation of the generalization performance of the proposed model is needed.
Applicability verification for various real-world environments is required.
Analysis of computational cost and model complexity is needed.
👍