Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

Justice in Judgment: Unveiling (Hidden) Bias in LLM-assisted Peer Reviews

Created by
  • Haebom

Author

Sai Suresh Marchala Vasu, Ivaxi Sheth, Hui-Po Wang, Ruta Binkyte, Mario Fritz

Outline

This paper investigates the impact of large-scale language models (LLMs) on the peer review process, particularly concerns about fairness and trustworthiness. We analyze the biases generated by LLMs in peer review through controlled experiments using sensitive metadata such as author affiliations and gender. Our analysis reveals that LLMs consistently exhibit affiliation biases favoring institutions ranked highly in mainstream academic rankings, as well as subtle but potentially intensifying gender preferences over time. We find that implicit biases are even more pronounced in token-based soft ratings.

Takeaways, Limitations

Takeaways: Clearly addresses the fairness and reliability issues of the LLM-based peer review system and experimentally demonstrates the presence of affiliation and gender bias. The discovery that implicit bias is more pronounced in token-based ratings provides important insights for system improvement.
Limitations: This study's analysis is limited to specific metadata (affiliation, gender). Further research is needed to determine bias in other sensitive attributes (race, nationality, etc.). Due to limitations in the experimental design and dataset, generalizations regarding the degree and impact of bias should be made with caution. Furthermore, there is a lack of consideration for potential bias not only in the LLM itself but also in the design and operation of the peer review system that utilizes the LLM.
👍