Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Spectral and Rhythm Feature Performance Evaluation for Category and Class Level Audio Classification with Deep Convolutional Neural Networks

Created by
  • Haebom

Author

Friedrich Wolf-Monheim

Outline

This paper compares and analyzes the performance of various spectral and rhythmic features (mel-scaled spectrograms, MFCCs, cyclic tempograms, STFT chromagrams, CQT chromagrams, and CENS chromagrams) in audio data classification using deep convolutional neural networks (CNNs). Using the ESC-50 dataset (2,000 environmental audio recordings), we evaluate accuracy, precision, recall, and F1 scores for audio category- and class-level classification. We find that mel-scaled spectrograms and MFCCs significantly outperform other features in CNN-based audio classification.

Takeaways, Limitations

Takeaways:
Experimentally demonstrating that Mel-scaled spectrograms and MFCCs are effective features in CNN-based audio classification.
Comparative analysis of various audio features to provide guidelines for optimal feature selection for CNN-based audio classification.
Limitations:
Lack of validation of generalization performance using only one ESC-50 dataset.
Lack of research on other CNN architectures or hyperparameter tuning.
Lack of consideration of other useful audio features beyond the analyzed features.
👍