Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

Population-Aligned Persona Generation for LLM-based Social Simulation

Created by
  • Haebom

Author

Zhengyu Hu, Jianxun Lian, Zheyuan Xiao, Max Xiong, Yuxuan Lei, Tianfu Wang, Kaize Ding, Ziang Xiao, Nicholas Jing Yuan, Xing Xie

Outline

Advances in large-scale language models (LLMs) have enabled human-like social simulations at unprecedented scale and fidelity. However, constructing persona sets that authentically represent the diversity and distribution of real-world populations remains a critical challenge. In this paper, we propose a systematic framework for synthesizing high-quality, population-aligned persona sets for LLM-based social simulations. This framework begins by leveraging LLMs to generate narrative personas from long-term social media data and filtering out low-fidelity profiles through quality assessment. Importance sampling is then applied to achieve global alignment to reference psychometric distributions, such as the Big Five personality traits. To address the needs of specific simulation contexts, we add task-specific modules that apply the globally aligned persona sets to target subpopulations. Extensive experiments demonstrate that our methodology significantly reduces population-level bias and enables accurate and flexible social simulations with broad research and policy applications.

Takeaways, Limitations

Takeaways:
We present a framework for synthesizing high-quality, population-aligned persona sets for LLM-based social simulation.
Create realistic personas using social media data.
Demographic sorting through importance sampling
Adapting persona sets to specific simulation contexts
Reduced population-level bias and accurate social simulations possible
Limitations:
Specific Limitations is not stated in the paper (assumed to be a summary)
👍