Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Street-Level AI: Are Large Language Models Ready for Real-World Judgments?

Created by
  • Haebom

Author

Gaurab Pokharel, Shafkat Farabi, Patrick J. Fowler, Sanmay Das

Outline

This paper reviews recent research exploring the ethical and social implications of large-scale AI models making "moral" judgments. While previous research has primarily focused on the alignment with human judgment through various thought experiments or the collective fairness of AI judgment, this paper focuses on AI's most immediate and promising application: assisting or replacing frontline officials in determining the allocation of scarce social resources or benefit approvals. Drawing on a rich historical background of how societies determine prioritization mechanisms for allocating scarce resources, this paper uses real-world data on homeless service needs to examine how well LLM judgments align with human judgment and currently used vulnerability scoring systems (to maintain data confidentiality, only local, large-scale models are used). The analysis reveals significant inconsistencies in LLM prioritization decisions across multiple dimensions: across implementations, across LLMs, and between LLMs and vulnerability scoring systems. At the same time, LLMs demonstrate qualitative agreement with typical human judgment in two-way comparison tests. These results suggest that current-generation AI systems are simply not ready to be integrated into high-stakes societal decision-making.

Takeaways, Limitations

Takeaways: Current LLMs demonstrate insufficient reliability for direct use in high-risk social decision-making (e.g., allocating scarce resources). While LLMs' judgments are consistent with human judgment in some respects, they lack internal consistency and consistency with other systems. Analyses using real-world data provide important implications for the practical applicability of AI systems.
Limitations: The study was limited to a specific domain (resource allocation for the homeless) and its generalizability to other social decision-making domains is limited. While data confidentiality was maintained by using only regional, large-scale models, the possibility that model characteristics may have influenced the results cannot be ruled out. The qualitative consistency of LLMs may be more subjective than quantitative measures.
👍