Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Mitigating Hallucinations in LM-Based TTS Models via Distribution Alignment Using GFlowNets

Created by
  • Haebom

Author

Chenlin Liu, Minghui Fang, Patrick Zhang, Wei Zhou, Jie Gao, Jiqing Han

Outline

This paper proposes GOAT (GFlOwNet-guided distribution Alignment), a novel method for addressing hallucinations in language model (LM)-based text-to-speech (TTS) systems. Unlike existing methods, GOAT is a post-training framework that mitigates hallucinations without excessive training resources or inference delays. By analyzing the strong correlation between model uncertainty and hallucinations, we reformulate TTS generation as a trajectory flow optimization problem, introducing enhanced sub-trajectory balance objectives and sharpened internal rewards as target distributions. We integrate reward temperature reduction and learning rate optimization to balance stability and performance. Experimental results demonstrate strong generalization ability and effectiveness, reducing character error rates by over 50% and uncertainty by up to 58% on challenging test cases.

Takeaways, Limitations

Takeaways:
We present a novel post-training framework that effectively alleviates the hallucination problem of LM-based TTS without excessive resources or inference delay.
A novel approach to solving the hallucination problem based on model uncertainty analysis is presented.
Improved performance through improved sub-trajectory balance objectives and sharpened internal rewards.
Stability and performance balance achieved through compensation temperature reduction and learning rate optimization.
It shows high performance and generalization ability even in difficult test cases.
Limitations:
The performance of the proposed method may depend on specific datasets or models.
Generalization performance needs to be assessed for other types of hallucinations or errors.
Further research is needed for application to actual commercial systems.
👍