Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Revisiting SSL for sound event detection: complementary fusion and adaptive post-processing

Created by
  • Haebom

Author

Hanfang Cui, Longfei Song, Li Li, Dongxing Xu, Yanhua Long

Outline

This paper systematically evaluates the synergy of state-of-the-art self-supervised learning (SSL) models for acoustic event detection (SED) and presents guidelines for optimal model selection and integration. We propose a framework that combines various SSL representations (e.g., BEATs, HuBERT, and WavLM) through three fusion strategies: individual SSL embedding integration, dual-modal fusion, and global aggregation. Experimental results on the DCASE 2023 Task 4 Challenge demonstrate that dual-modal fusion (e.g., CRNN+BEATs+WavLM) achieves complementary performance improvements, with the CRNN+BEATs combination achieving the best performance among individual SSL models. Furthermore, we introduce regularized acoustic event bounding boxes (nSEBBs), an adaptive postprocessing method that dynamically adjusts event boundary predictions, improving the PSDS1 of standalone SSL models by up to 4%. These results highlight the compatibility and complementarity of SSL architectures and provide guidance for task-specific fusion and robust SED system design.

Takeaways, Limitations

Takeaways:
Suggesting the possibility of improving SED performance through the fusion of various SSL models.
Experimental demonstration of the effectiveness of the dual-mode fusion strategy.
Improving SED performance with nSEBBs postprocessing techniques.
Provides guidelines for selecting SSL models and fusion strategies that are appropriate for specific tasks.
Limitations:
Only experimental results on a limited dataset (DCASE 2023 Task 4 Challenge) are presented.
Generalizability to other SED datasets or more diverse SSL models needs to be verified.
Further research is needed on the applicability and generalization performance of nSEBBs.
Lack of analysis of the computational cost and complexity of the proposed fusion framework.
👍