Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Human Bias in the Face of AI: Examining Human Judgment Against Text Labeled as AI Generated

Created by
  • Haebom

Author

Tiffany Zhu, Iain Weissburg, Kexun Zhang, William Yang Wang

Outline

This paper explores whether human trust in AI-generated text is limited by biases that go beyond concerns about accuracy. We examined how human raters react to labeled and unlabeled content across three experiments: text editing, news article summaries, and persuasive writing. While blind tests failed to distinguish between the two types of text, we found that human raters preferred content labeled "human-generated" over "AI-generated" by more than 30%. The same pattern was observed when labels were intentionally changed. This human bias in AI has broader social and cognitive implications, including underestimating AI performance. This study highlights the limitations of human judgment when interacting with AI and provides a foundation for improving human-AI collaboration, particularly in creative fields.

Takeaways, Limitations

Takeaways: This study demonstrates the existence of human bias in AI-generated content, which is more influenced by labels than the actual performance of the AI. This highlights the need for research to improve human-AI collaboration. It also suggests a shift in societal perceptions of the role and use of AI, particularly in the creative field.
Limitations: Analysis of differences in bias based on participant characteristics (age, occupation, prior knowledge of AI, etc.) is insufficient. Further research is needed, utilizing a wider range of AI-generated content and a wider range of evaluation methods. This study only examined bias due to labels, and may not have sufficiently considered the impact of qualitative differences in the content itself on evaluation results.
👍