Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

Out-of-Distribution Detection using Synthetic Data Generation

Created by
  • Haebom

Author

Momin Abbas, Muneeza Azmat, Raya Horesh, Mikhail Yurochkin

Outline

We present a method for generating high-quality synthetic OOD proxies by leveraging the generative capabilities of LLM, eliminating reliance on external OOD data sources. We study the effectiveness of our method on classical text classification tasks, such as toxicity detection and sentiment classification, as well as classification tasks used in LLM development and deployment, such as training reward models for RLHF and detecting misaligned productions.

Takeaways, Limitations

A novel method for generating synthetic data for OOD detection using LLM is presented.
It performs well in various tasks such as toxicity detection, sentiment classification, RLHF reward model training, and misaligned generative detection.
We demonstrate the effectiveness of our method through experiments on nine InD-OOD data pairs and various model sizes.
Achieve much lower false positive rates and higher accuracy than existing methods.
The paper does not specifically mention Limitations
👍