Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

A Novel Hybrid Deep Learning Technique for Speech Emotion Detection using Feature Engineering

Created by
  • Haebom

Author

Shahana Yasmin Chowdhury, Bithi Banik, Md Tamjidul Hoque, Shreya Banerjee

Outline

In this paper, we propose DCRF-BiLSTM, a speech emotion recognition (SER) model that recognizes various emotions (neutral, happy, sad, angry, fear, disgust, and surprise). We train the model using five datasets: RAVDESS, TESS, SAVEE, EmoDB, and Crema-D, and achieve high accuracies on individual datasets (RAVDESS 97.83%, SAVEE 97.02%, CREMA-D 95.10%, and TESS and EmoDB 100%). In particular, when all three datasets (R+T+S) are combined, the accuracy reaches 98.82%, outperforming previous studies. In addition, this is the first study to evaluate all five benchmark datasets integratedly, achieving a high overall accuracy of 93.76%, verifying the robustness and generalization performance of the DCRF-BiLSTM framework.

Takeaways, Limitations

Takeaways:
The DCRF-BiLSTM model achieved high accuracy on various speech emotion recognition datasets, demonstrating the superiority and generalization performance of the model.
By integrating and evaluating five major datasets, we overcome the limitations of previous studies and provide a more comprehensive performance evaluation.
We present the potential application of the DCRF-BiLSTM model in the field of speech emotion recognition.
Limitations:
There may be a lack of consideration for the influence of various noises or background sounds in the real environment.
There may be a lack of analysis of the balance and bias of the dataset used.
Additional research may be needed to determine the interpretability of the model.
👍