SpeechWeave is a pipeline that automates the generation of multilingual, domain-specific synthetic datasets for training high-quality TTS models. It uses LLM to generate text data, solves text normalization problems, and generates synthetic speech data with standardized speech. Experimental results show that SpeechWeave generates data that is 10-48% more diverse than existing methods across various linguistic and phonetic metrics, normalizes text with approximately 97% accuracy, and generates speaker-normalized speech audio.