Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

The Personality Illusion: Revealing Dissociation Between Self-Reports & Behavior in LLMs

Created by
  • Haebom

Author

Pengrui Han, Rafal Kocielnik, Peiyang Song, Ramit Debnath, Dean Mobbs, Anima Anandkumar, R. Michael Alvarez

Outline

This paper systematically analyzes personality traits in large-scale language models (LLMs), assessing the dynamics of trait expression across training stages, the predictive validity of self-reported traits, and the impact of interventions such as persona infusion. Our findings demonstrate that instructional tuning (e.g., RLHF) stabilizes trait expression and strengthens trait correlations similar to human data, but self-reported traits do not reliably predict behavior, and observed correlations often do not align with human patterns. Persona infusion successfully steers self-reports in the desired direction, but has little or inconsistent effects on actual behavior. Therefore, by distinguishing between superficial trait expression and behavioral consistency, we challenge assumptions about personality in LLMs and highlight the need for a deeper evaluation of alignment and interpretability.

Takeaways, Limitations

Takeaways:
During my LLM training, I found that directive alignment plays a crucial role in increasing the stability and consistency of personality trait expression.
LLM's self-reported personality traits show limitations in predicting actual behavior.
We found that interventions such as persona infusion influenced LLM self-reports but had limited effects on actual behavior change.
Limitations:
It's possible that the LLM's personality traits were assessed solely based on self-reports and behavioral observations. More diverse and sophisticated assessment methods may be needed.
Results may vary depending on the type of LLM used in the study and the characteristics of the training data. Further research is needed to determine generalizability.
The complex relationship between personality traits and behaviors in LLM may not be fully explained. Further analysis and interpretation are required.
👍