Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

ForTIFAI: Fending Off Recursive Training Induced Failure for AI Models

Created by
  • Haebom

Author

Soheil Zibakhsh Shabgahi, Pedram Aghazadeh, Azalia Mirhoseini, Farinaz Koushanfar

Outline

This paper proposes a novel approach to address the problem of model collapse caused by repeated training on synthetic data, given the projection that most training data will be machine-generated synthetic data by 2030. We identify overconfidence in the model's own generated data as a primary cause of model collapse and propose Truncated Cross Entropy (TCE), a confidence-aware loss function that downweights high-confidence predictions. Theoretical and experimental analyses demonstrate that TCE extends the performance retention period before model collapse by more than 2.3 times, demonstrating its generalizability across various modalities. In conclusion, our loss function design suggests a simple yet powerful tool for maintaining the quality of generated models in the era of synthetic data.

Takeaways, Limitations

Takeaways:
We identify the main cause of model collapse as model overconfidence and propose a new loss function (TCE) to address this.
We experimentally demonstrate that TCE can significantly delay model collapse and extend the model's performance retention period by more than 2.3 times.
We show that the proposed method is a model-independent framework applicable to various modalities.
We suggest that loss function design is an effective strategy for maintaining the quality of generative models in the era of synthetic data.
Limitations:
Further research may be needed to determine the optimal parameters of the proposed TCE loss function.
Further validation of TCE's generalization performance across various synthetic data generation methods and model architectures may be required.
Additional research is needed to evaluate performance and apply it to actual large-scale datasets.
👍