Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Mitigating Hallucinations in LM-Based TTS Models via Distribution Alignment Using GFlowNets

Created by
  • Haebom

Author

Chenlin Liu, Minghui Fang, Patrick Zhang, Wei Zhou, Jie Gao, Jiqing Han

Outline

This paper proposes GOAT (GFlOwNet-guided Distribution Alignment), a novel method for addressing hallucinations in language model-based text-to-speech (TTS) systems. Unlike existing methods, GOAT is a post-training framework that mitigates hallucinations without excessive training resources or inference delays. We analyze the strong correlation between model uncertainty and hallucinations and reframe TTS generation as a trajectory flow optimization problem, employing enhanced sub-trajectory balance objectives and sharply tuned internal rewards as the target distribution. We integrate reward temperature reduction and learning rate optimization to balance stability and performance. Experimental results demonstrate excellent generalization and effectiveness, reducing character error rates by more than 50% and uncertainty by up to 58% on challenging test cases.

Takeaways, Limitations

Takeaways:
We present a novel method to effectively alleviate the hallucination problem of language model-based TTS without excessive resources or inference delay.
Presenting an effective hallucination mitigation strategy based on model uncertainty analysis.
Easily applied to existing models through a post-training framework.
Experimentally verified high performance improvement and generalization ability.
Limitations:
There is a possibility that the effectiveness of the proposed method may be limited to specific datasets or models.
Additional evaluation of generalization performance across different hallucination types is needed.
Additional performance evaluation in actual application environments is required.
👍