Search papers, labs, and topics across Lattice.
The paper introduces TwinVoice, a multi-dimensional benchmark designed to evaluate the persona simulation capabilities of Large Language Models (LLMs) across social, interpersonal, and narrative contexts. TwinVoice decomposes persona simulation into six fundamental capabilities, including opinion consistency, memory recall, and syntactic style, providing a granular assessment framework. Experiments using TwinVoice reveal that while LLMs demonstrate moderate accuracy, they significantly underperform in areas like syntactic style and memory recall compared to human baselines, highlighting areas for future research.
LLMs still can't convincingly mimic human personas, especially when it comes to syntactic style and memory, despite advancements in other areas.
Large Language Models (LLMs) are exhibiting emergent human-like abilities and are increasingly envisioned as the foundation for simulating an individual's communication style, behavioral tendencies, and personality traits. However, current evaluations of LLM-based persona simulation remain limited: most rely on synthetic dialogues, lack systematic frameworks, and lack analysis of the capability requirement. To address these limitations, we introduce TwinVoice, a comprehensive benchmark for assessing persona simulation across diverse real-world contexts. TwinVoice encompasses three dimensions: Social Persona (public social interactions), Interpersonal Persona (private dialogues), and Narrative Persona (role-based expression). It further decomposes the evaluation of LLM performance into six fundamental capabilities, including opinion consistency, memory recall, logical reasoning, lexical fidelity, persona tone, and syntactic style. Experimental results reveal that while advanced models achieve moderate accuracy in persona simulation, they still fall short of capabilities such as syntactic style and memory recall. Consequently, the average performance achieved by LLMs remains considerably below the human baseline.