Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

Dual Branch VideoMamba with Gated Class Token Fusion for Violence Detection

Created by
  • Haebom

Author

Damith Chamalke Senadeera, Xiaoyun Yang, Shibo Li, Muhammad Awais, Dimitrios Kollias, Gregory Slabaugh

Outline

With the proliferation of surveillance cameras, the demand for automatic violence detection is increasing. This model is proposed to overcome the limitations of CNNs and Transformers, which struggle with spatial-temporal feature extraction. In this paper, we propose Dual-Branch VideoMamba, which utilizes Gated Class Token Fusion (GCTF), combining a dual-branch design with a State-Space Model (SSM) backbone. This model enhances the detection of violent acts even in challenging surveillance scenarios by performing fusion through a gating mechanism between branches that capture spatial features and branches that focus on temporal dynamics. Furthermore, we present a new benchmark by merging the RWF-2000, RLVS, SURV, and VioPeru datasets, and achieve state-of-the-art performance on the DVD dataset, achieving a balance between accuracy and computational efficiency.

Takeaways, Limitations

Takeaways:
The dual-branch VideoMamba-GCTF model effectively combines spatial and temporal features to improve violence detection accuracy.
Approaching real-time processing by improving computational efficiency through the use of SSM (State-Space Model).
Achieving SOTA on new benchmark datasets (RWF-2000, RLVS, SURV, VioPeru merged) and DVD dataset.
Limitations:
The specific Limitations is not specified in the paper.
👍