This is a page that curates AI-related papers published worldwide. All content here is summarized using Google Gemini and operated on a non-profit basis. Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.
Language model developers should report train-test overlap
Created by
Haebom
Author
Andy K Zhang, Kevin Klyman, Yifan Mai, Yoav Levine, Yian Zhang, Rishi Bommasani, Percy Liang
Outline
This paper addresses the problem of train-test overlap between training and test data to improve the reliability of language model evaluation. It points out that it is difficult to measure train-test overlap because most language models currently disclose only evaluation results without disclosing training data. The research team conducted a survey of 30 model developers to analyze the status of information disclosure related to train-test overlap, and found that only 9 developers disclose related information. Furthermore, it is argued that language model developers should disclose train-test overlap statistics and/or training data when reporting evaluation results on public test sets.
Takeaways, Limitations
•
Takeaways:
◦
We emphasize the importance of disclosing train-test overlap information to ensure the reliability of language model evaluation.
◦
It exposes the lack of transparency in current language model evaluation.
◦
The current situation is presented in detail through the results of a survey of 30 model developers.
◦
Encourage voluntary participation from developers in disclosing train-test overlap information.
•
Limitations:
◦
Low survey participation rates may limit generalizability.
◦
There is a lack of specific methodological suggestions for measuring and disclosing train-test overlap.
◦
Since we only targeted 30 model developers, it is difficult to generalize to all language models.