Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

UST-SSM: Unified Spatio-Temporal State Space Models for Point Cloud Video Modeling

Created by
  • Haebom

Author

Peiming Li, Ziyi Wang, Yulin Yuan, Hong Liu, Xiangming Meng, Junsong Yuan, Mengyuan Liu

Outline

This paper proposes the Unified Spatio-Temporal State-Space Model (UST-SSM) to address the spatio-temporal chaos problem in point cloud videos. UST-SSM extends the Selective State-Space Model (SSM) to point cloud videos and introduces the Spatio-Temporal Selective Scanning (STSS) technique, which reconstructs chaotic points into semantically recognized sequences through prompt-based clustering. Furthermore, it utilizes Spatio-Temporal Structure Aggregation (STSA) to compensate for missing 4D geometric and motion information, and proposes Temporal Interaction Sampling (TIS) to enhance fine-grained temporal dependencies by leveraging non-anchor frames and expanding receptive fields. Experimental results on the MSR-Action3D, NTU RGB+D, and Synthia 4D datasets demonstrate the effectiveness of the proposed method. The source code is available publicly.

Takeaways, Limitations

Takeaways:
We present an effective model for subtle and continuous human action recognition from point cloud videos.
We improved the performance of SSM by solving the spatial-temporal disorder problem.
Effectively utilize the spatiotemporal information of point cloud videos using STSS, STSA, and TIS techniques.
We verified its performance through experiments on various datasets.
Reproducibility was achieved through source code disclosure.
Limitations:
A detailed analysis of the computational complexity and efficiency of the proposed method is lacking.
Additional evaluation of generalization performance on various types of point cloud video data is needed.
A sensitivity analysis is needed on the performance of prompt-based clustering.
Further research is needed to determine its applicability in real-world applications.
👍