[공지사항]을 빙자한 안부와 근황 
Show more

Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Detecting Benchmark Contamination Through Watermarking

Created by
  • Haebom

Author

Tom Sander, Pierre Fernandez, Saeed Mahloujifar, Alain Durmus, Chuan Guo

Outline

This paper proposes a method to apply watermarking to benchmarks to address the problem of benchmark contamination, which poses a serious threat to the reliability of large-scale language model (LLM) evaluation. Watermarking is performed by reconstructing the original question into a watermarked LLM without compromising the usability of the benchmark. In the evaluation process, a theoretically supported statistical test is used to detect the “radioactivity”, which is the trace left by the text watermark during model training. A 1 billion-parameter model with 10 billion tokens is pre-trained from scratch, and the contamination detection effectiveness is verified on ARC-Easy, ARC-Challenge, and MMLU. As a result, the benchmark usability is similar after watermarking, and contamination detection is successful when the contamination is sufficient to improve performance (e.g., +5% improvement in ARC-Easy, p-value = 10⁻³).

Takeaways, Limitations

Takeaways:
We present a novel benchmark contamination prevention technique that can increase the reliability of LLM assessments.
We demonstrate that watermarking techniques can effectively detect contamination while maintaining the usability of the benchmark.
The presented statistical test allows quantitative determination of contamination.
Limitations:
The effectiveness of the presented method has been verified only for models and datasets of a certain scale, and its generalizability to models or datasets of different scales requires further study.
Further analysis is needed on the impact of the watermarking technique itself on model performance.
We must consider the possibility that new contamination techniques may emerge that bypass watermarking.
👍