Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

Injecting Explainability and Lightweight Design into Weakly Supervised Video Anomaly Detection Systems

Created by
  • Haebom

Author

Wen-Dong Jiang, Chih-Yung Chang, Hsiang-Chuan Chang, Ji-Yuan Chen, Diptendu Sinha Roy

Outline

This paper proposes a Two-stage Cross-modal Video Anomaly Detection System (TCVADS) to address the Weakly Supervised Learning-Based Anomaly Detection (WSMAD) problem for smart city monitoring. This system enables efficient, accurate, and interpretable anomaly detection on edge devices. TCVADS consists of two stages: coarse-grained classification and fine-grained analysis. In the first stage, a time-series analysis module (teacher model) extracts features and transfers them to a simplified convolutional neural network (student model) through knowledge distillation for binary classification. Once anomalies are detected, the second stage is activated to perform fine-grained multi-classification through cross-modal contrastive learning using CLIP and enhance interpretability through specially designed triplet text relationships. Experimental results demonstrate that TCVADS outperforms existing methods in terms of model performance, detection efficiency, and interpretability.

Takeaways, Limitations

Takeaways:
Provides an efficient, accurate, and interpretable anomaly detection system on edge devices.
Effectively utilize knowledge distillation and cross-modal contrastive learning to improve model performance.
A two-step approach performs both coarse classification and detailed analysis to improve accuracy and efficiency.
Making a practical contribution to the field of smart city monitoring.
Limitations:
Further research is needed to evaluate the generalization performance of the proposed model.
Robustness assessments for various types of anomalies are required.
Further research is needed on implementation and application in real-world smart city environments.
Consideration may need to be given to the increased computational costs associated with utilizing CLIP.
👍