[공지사항]을 빙자한 안부와 근황 
Show more

Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Bias in Decision-Making for AI's Ethical Dilemmas: A Comparative Study of ChatGPT and Claude

Created by
  • Haebom

Author

Yile Yan, Yuqi Zhu, Wentao Xu

Outline

This study systematically evaluated the ethical decision-making ability and potential bias of large-scale language models (LLMs) using two models, GPT-3.5 Turbo and Claude 3.5 Sonnet, to assess their responses to ethical dilemmas. We analyzed the ethical preferences, sensitivity, stability, and clustering of preferences of the models across 11,200 experiments that included multiple protected attributes, including age, gender, race, appearance, and disability status. The results revealed consistent preferences for certain attributes (e.g., “good-looking”) and systematic disregard for other attributes in both models. GPT-3.5 Turbo showed strong preferences consistent with existing power structures, while Claude 3.5 Sonnet showed a wider range of protected attribute choices. Furthermore, we found that ethical sensitivity decreased significantly in more complex scenarios involving multiple protected attributes. We found that linguistic references significantly influenced the models’ ethical evaluations, as evidenced by their different responses to racial descriptors (“Yellow” vs. “Asian”). This study highlights important concerns about the potential impact of LLM bias in autonomous decision-making systems and emphasizes the need to carefully consider protective properties in AI development.

Takeaways, Limitations

Takeaways:
Providing a systematic evaluation framework for ethical decision-making in LLM.
Confirming the presence of bias for protection properties in GPT-3.5 Turbo and Claude 3.5 Sonnet.
We found a phenomenon of reduced ethical sensitivity in complex scenarios.
To determine the influence of linguistic expressions on ethical judgments in LLM.
Emphasizes the importance of considering protective properties when developing AI.
Limitations:
Limited number of models used in the study.
Difficulties in interpretation due to the complexity of the experimental design.
Lack of consideration of diverse linguistic and cultural contexts.
Lack of evaluation of long-term ethical implications.
👍