Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Fairness in Dysarthric Speech Synthesis: Understanding Intrinsic Bias in Dysarthric Speech Cloning using F5-TTS

Created by
  • Haebom

Author

Anuprabha M, Krishna Gurugubelli, Anil Kumar Vuppala

Outline

This paper focuses on developing assistive technologies for dysarthria speech, which is challenging due to limited data. Recent advances in neural speech synthesis utilizing zero-shot speech replication techniques facilitate the generation of synthetic voices for data augmentation, but can introduce biases in dysarthria speech. Using the TORGO dataset, this study investigates the effectiveness of state-of-the-art F5-TTS in replicating dysarthria speech in terms of intelligibility, speaker similarity, and prosody preservation. Furthermore, fairness metrics such as unfair impact and parity difference are used to assess the imbalance between dysarthria severity levels.

Takeaways, Limitations

Takeaways: We found that F5-TTS exhibited a strong bias toward speech intelligibility over speaker and prosody preservation in speech synthesis for dysarthria. This study may contribute to the development of more comprehensive speech technologies by integrating speech synthesis for dysarthria with fairness in mind.
Limitations: The specific Limitations is not explicitly mentioned in the paper. However, further analysis may be needed regarding the dependence on a specific dataset (TORGO) and the performance of the F5-TTS model. Furthermore, there is insufficient discussion on the scope and limitations of bias analysis using fairness metrics.
👍