Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Decoding AI Judgment: How LLMs Assess News Credibility and Bias

Created by
  • Haebom

Author

Edoardo Loru, Jacopo Nudo, Niccol o Di Marco, Alessandro Santirocchi, Roberto Atzeni, Matteo Cinelli, Vincenzo Cestari, Clelia Rossi-Arnaud, Walter Quattrociocchi

Outline

As large-scale language models (LLMs) are increasingly integrated into workflows that involve evaluation processes, this paper addresses the need to investigate how these evaluations are constructed, what assumptions they rely on, and how they differ from human strategies. The study benchmarks six LLMs against NewsGuard and Media Bias/Fact Check (MBFC) expert evaluations and human judgments collected through controlled experiments. It implements a structured, goal-oriented framework in which both models and non-expert participants follow the same evaluation procedure (criteria selection, content retrieval, and justification generation), enabling direct comparisons. Despite their consistent outputs, LLMs rely on different mechanisms, such as lexical associations and statistical prior knowledge replacing contextual inference. This reliance produces systematic effects, such as political asymmetry, opaque justification, and a tendency to confuse linguistic form with epistemic validity. Delegating judgment to LLMs is therefore not simply automating evaluation, but rather redefining evaluation from normative reasoning to pattern-based approximation.

Takeaways, Limitations

Takeaways: Promotes a deeper discussion of the reliability and ethical implications of LLM-based assessment systems by clearly demonstrating the systematic biases and limitations that arise when LLMs are integrated into the assessment process. By increasing understanding of the LLM judgment mechanism, it can contribute to the development of more accurate and fair assessment systems.
Limitations: This study is limited to a specific LLM and assessment tool, and generalization to other LLMs or assessment domains may be limited. It is difficult to completely rule out the subjectivity and inconsistency of human judgment. Since the mechanism of LLM is not fully explained, further research is needed to fully identify the root cause of systematic bias.
👍