Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

Towards Understanding Bias in Synthetic Data for Evaluation

Created by
  • Haebom

Author

Hossein A. Rahmani, Varsha Ramineni, Emine Yilmaz, Nick Craswell, Bhaskar Mitra

Outline

This paper studies the reliability of synthetic test collections generated using large-scale language models (LLMs). We investigate potential biases in synthetic test collections that utilize LLMs to generate queries, labels, or both, and analyze their impact on system evaluation. Our results demonstrate the presence of bias in evaluations using synthetic test collections, suggesting that while bias may impact absolute system performance measurements, it may be less significant in comparing relative system performance.

Takeaways, Limitations

The use of synthetic test collections generated by LLM may introduce bias into system evaluation.
Bias introduced in synthetic test collections can affect absolute system performance measurements.
For comparisons of relative system performance, biases in synthetic test collections may have a less significant impact.
Further analysis is needed to further validate the usability of the synthetic test collection.
👍