Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Spotlight-TTS: Spotlighting the Style via Voiced-Aware Style Extraction and Style Direction Adjustment for Expressive Text-to-Speech

Created by
  • Haebom

Author

Nam-Gyu Kim, Deok-Hyeon Cho, Seung-Bin Kim, Seong-Whan Lee

Outline

In this paper, we propose Spotlight-TTS to address the challenges of high-quality, expressive speech synthesis, building on recent studies suggesting various methods based on style embedding extracted from reference speech for expressive Text-to-Speech (TTS). Spotlight-TTS exclusively emphasizes styles through speech-aware style extraction and style direction adjustment. Speech-aware style extraction focuses on voiced segments with high style relevance while maintaining continuity between different speech segments to enhance expressiveness. In addition, it improves speech quality by adjusting the direction of the extracted style and optimally integrating it into the TTS model. Experimental results show that Spotlight-TTS outperforms baseline models in terms of expressiveness, overall speech quality, and style transferability, and its speech samples are publicly available.

Takeaways, Limitations

Takeaways:
Speech recognition style extraction and style direction adjustment provide the possibility of high-quality, expressive speech synthesis.
Improving the expressiveness and voice quality of existing TTS models
Excellent style transferability
Ease of validation of research findings through publicly available audio samples
Limitations:
Absence of explicit mention of __T285850_____ presented in the paper
Lack of detailed description of experimental environment and dataset requires review of generalizability
Possible dependency on specific language or speech data
👍