Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Adaptability of ASR Models on Low-Resource Language: A Comparative Study of Whisper and Wav2Vec-BERT on Bangla

Created by
  • Haebom

Author

Md Sazzadul Islam Ridoy, Sumi Akter, Md. Aminur Rahman

Outline

In this paper, we study two state-of-the-art automatic speech recognition (ASR) models, OpenAI’s Whisper (Small & Large-V2) and Facebook’s Wav2Vec-BERT, to evaluate their speech recognition performance for Bangla, a low-resource language. Using two public datasets, Mozilla Common Voice-17 and OpenSLR, we compare the model performances in terms of word error rate (WER), character error rate (CER), training time, and computational efficiency through systematic fine-tuning and hyperparameter optimization, including learning rate, epochs, and model checkpoint selection. As a result, we confirm that the Wav2Vec-BERT model outperforms the Whisper model in all key evaluation metrics, and requires less computational resources.

Takeaways, Limitations

Takeaways: We experimentally demonstrated that the Wav2Vec-BERT model provides more efficient and accurate speech recognition performance than the Whisper model in low-resource language environments. This provides important Takeaways for the development of robust speech recognition systems for low-resource languages.
Limitations: This study is limited to two specific datasets and two ASR models, and the generalizability to other low-resource languages or models requires further study. In addition, there is a lack of in-depth analysis of the impact of the size and quality of the datasets used on model performance.
👍