Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Do Biased Models Have Biased Thoughts?

Created by
  • Haebom

Author

Swati Rajwal, Shivank Garg, Reem Abdel-Salam, Abdelrahman Zayed

Outline

This paper investigates the fairness issue of large-scale language models (LLMs) using the Chain-of-Thought prompting technique, a recently gaining attention technique. We analyze not only the output of LLMs, which contain various biases such as gender, race, socioeconomic status, appearance, and sexual orientation, but also the model's internal thought processes (thinking steps) using Chain-of-Thought prompting to measure the presence and extent of bias. Quantitatively analyzing 11 biases across five popular LLMs, we found no significant correlation between the biases in the model's thought processes and those in its final output (correlation coefficients less than 0.6, p-value < 0.001). This suggests that, unlike humans, models that make biased decisions do not always exhibit biased thought processes.

Takeaways, Limitations

Takeaways:
We present a novel approach to analyzing the internal thought processes of models by utilizing the Chain-of-Thought prompting technique in the study of LLM bias.
By revealing a low correlation between LLM output bias and internal thought process bias, we provide a new perspective on existing bias-solving approaches.
By demonstrating the differences in the bias generation mechanisms between humans and LLMs, we can contribute to developing new strategies to address the bias problem in LLMs.
Limitations:
The types of LLMs used in the analysis and the types of bias may be limited.
It is necessary to verify that the thought process revealed through Chain-of-Thought prompting perfectly reflects the internal workings of the actual model.
A low correlation does not necessarily mean no causality, so further research is needed on the relationship between output bias and thought process bias.
Further analysis is needed to determine whether correlations less than 0.6 are so low as to be negligible.
👍