CMU MLLanguage Techonology InstituteUChicagoUTokyoApr 27, 2026arXiv:2604.24698

The Chameleon's Limit: Investigating Persona Collapse and Homogenization in Large Language Models

Yunze Xiao, Vivian Zhang, Chenghao Yang, Ning Ma, Weihao Xuan, Jen-tse Huang

AI Summary

The paper identifies and quantifies "Persona Collapse" in LLMs, where agents with distinct profiles converge to homogeneous behavior in multi-agent simulations. They introduce a framework measuring Coverage, Uniformity, and Complexity to evaluate persona collapse across different LLMs and tasks. Results show persona collapse varies across dimensions (e.g., personality vs. moral reasoning) and that models with higher per-persona fidelity paradoxically produce more stereotyped populations.

Key Contribution

LLMs that nail individual personas can still fail spectacularly at generating diverse populations, instead defaulting to coarse stereotypes.

Abstract

Applications based on large language models (LLMs), such as multi-agent simulations, require population diversity among agents. We identify a pervasive failure mode we term \emph{Persona Collapse}: agents each assigned a distinct profile nonetheless converge into a narrow behavioral mode, producing a homogeneous simulated population. To quantify persona collapse, we propose a framework that measures how much of the persona space a population occupies (Coverage), how evenly agents spread across it (Uniformity), and how rich the resulting behavioral patterns are (Complexity). Evaluating ten LLMs on personality simulation (BFI-44), moral reasoning, and self-introduction, we observe persona collapse along two axes: (1) Dimensions: a model can appear diverse on one axis yet structurally degenerate on another, and (2) Domains: the same model may collapse the most in personality yet be the most diverse in moral reasoning. Furthermore, item-level diagnostics reveal that behavioral variation tracks coarse demographic stereotypes rather than the fine-grained individual differences specified in each persona. Counter-intuitively, \textbf{the models achieving the highest per-persona fidelity consistently produce the most stereotyped populations}. We release our toolkit and data to support population-level evaluation of LLMs.

Eval Frameworks & Benchmarks Natural Language Processing Tool Use & Agents

Citation Metrics

Citations0

Influential citations0

References49

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

The Chameleon's Limit: Investigating Persona Collapse and Homogenization in Large Language Models

Related Papers