Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Spectral and Rhythm Feature Performance Evaluation for Category and Class Level Audio Classification with Deep Convolutional Neural Networks

Created by
  • Haebom

Author

Friedrich Wolf-Monheim

Outline

This paper compares and analyzes the performance of various spectral and rhythmic features (mel-scaled spectrograms, MFCCs, cyclic tempograms, STFT chromagrams, CQT chromagrams, and CENS chromagrams) in audio data classification using deep convolutional neural networks (CNNs). Using the ESC-50 dataset (2,000 environmental audio recordings), we measured the accuracy, precision, recall, and F1 scores of each feature for audio category and class-level classification. Experiments were conducted using an end-to-end deep learning pipeline.

Takeaways, Limitations

Takeaways: We found that Mel-scaled spectrograms and MFCCs significantly outperform other spectral and rhythm features in audio classification tasks using deep CNNs. This provides valuable insights for the development of future audio classification models.
Limitations: Since the experiment was conducted using only the ESC-50 dataset, further research is needed to determine generalization performance on other datasets. Comparative analysis of various CNN architectures is lacking. The lack of performance analysis on combining different features can also be considered a Limitations issue.
👍