Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Enhancing GOP in CTC-Based Mispronunciation Detection with Phonological Knowledge

Created by
  • Haebom

Author

Aditya Kamlesh Parikh, Cristian Tejedor-Garcia, Catia Cucchiarini, Helmer Strik

Outline

This paper presents a study to improve the efficiency of Goodness of Pronunciation (GOP), a pronunciation quality measurement metric used in computer-assisted pronunciation training (CAPT) systems. Existing GOPs rely on forced alignment, which is vulnerable to labeling and segmentation errors due to acoustic variations. Alignment-free methods have been proposed, but they are computationally expensive and have performance degradation issues with the length of phoneme sequences and the size of phoneme lists. Therefore, in this paper, we propose a substitution-aware alignment-free GOP that restricts phoneme substitutions based on phoneme clusters and common learner errors. We evaluate the proposed method using two L2 English speech datasets (My Pronunciation Coach (MPC) and SpeechOcean762) and show that it outperforms existing methods.

Takeaways, Limitations

Takeaways:
We present a novel method to improve the efficiency of computing GOP without sorting.
Improved accuracy by taking into account phoneme clusters and common learner errors.
We validated its performance on various datasets, including children's speech data.
It can contribute to increasing the practicality of the CAPT system.
Limitations:
The degree of performance improvement of the proposed method may vary depending on the dataset.
Additional research on more diverse languages and datasets is needed.
Further research may be needed on phoneme clustering and definitions of common learner errors.
👍