Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

SpeechWeave: Diverse Multilingual Synthetic Text & Audio Data Generation Pipeline for Training Text to Speech Models

Created by
  • Haebom

Author

Karan Dua, Puneet Mittal, Ranjeet Gupta, Hitesh Laxmichand Patel

Outline

SpeechWeave is a pipeline that automates the generation of multilingual, domain-specific synthetic datasets for training high-quality TTS models. It uses LLM to generate text data, solves text normalization problems, and generates synthetic speech data with standardized speech. Experimental results show that SpeechWeave generates data that is 10-48% more diverse than existing methods across various linguistic and phonetic metrics, normalizes text with approximately 97% accuracy, and generates speaker-normalized speech audio.

Takeaways, Limitations

Takeaways:
Ability to generate scalable, high-quality data for training TTS models.
Improved diversity, normalization, and voice consistency.
Presenting a practical solution for large-scale voice recording in commercial TTS systems.
Limitations:
Limitations is not specified in the paper. (No additional information is provided beyond what is stated in the abstract.)
👍