Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

Large Language Model Hacking: Quantifying the Hidden Risks of Using LLMs for Text Annotation

Created by
  • Haebom

Author

Joachim Baumann, Paul R ottger, Aleksandra Urman, Albert Wendsj o, Flor Miriam Plaza-del-Arco, Johannes B. Gruber, Dirk Hovy

Outline

While large-scale language models (LLMs) enable the automation of social science research, their outputs can vary significantly depending on researcher choices (e.g., model selection, prompt strategy). This variability can influence analyses by introducing systematic bias and random errors, leading to Type I, II, S, and M errors. This phenomenon is referred to as LLM hacking. Intentional LLM hacking is simple, and replication of 37 data annotation tasks demonstrates that simply modifying prompts can yield statistically significant results. Furthermore, an analysis of 13 million labels from 18 LLMs across 2,361 realistic hypotheses revealed a high risk of inadvertent LLM hacking, even when standard research methods are followed. State-of-the-art LLMs yielded incorrect conclusions in approximately 31% of hypotheses, while small-scale language models yielded incorrect conclusions in half of hypotheses. The risk of LLM hacking decreased as effect size increased, demonstrating the critical role of human annotation in preventing false positives. Practical recommendations for preventing LLM hacking are presented.

Takeaways, Limitations

Takeaways:
While the use of an LLM can accelerate social science research, the results can vary significantly depending on the researcher's choices.
Accidental errors can occur not only through intentional manipulation but also when standard research methods are followed.
Even though LLM performance has improved, the risk of hacking has not completely disappeared.
Smaller effect sizes are more susceptible to LLM hacking, and LLM-based results should be rigorously validated near the significance threshold.
Human annotation is effective in preventing false positives, and regression estimator calibration techniques introduce trade-offs between error types.
Practical recommendations are needed to prevent LLM hacking.
Limitations:
No details are provided regarding specific LLM hacking prevention techniques.
Lack of quantitative analysis of the effectiveness of proposed mitigation techniques.
The study may be limited to a specific social science field, and further research is needed to determine generalizability to other fields.
👍