EmoryHITFeb 17, 2026arXiv:2602.15669

PERSONA: Dynamic and Compositional Inference-Time Personality Control via Activation Vector Algebra

Xiachong Feng, Liang Zhao, Weihong Zhong, Yichong Huang, Yuxuan Gu, Lingpeng Kong, Xiaocheng Feng, Bing Qin

AI Summary

The paper introduces PERSONA, a training-free framework for dynamic and compositional personality control in LLMs by manipulating activation vectors. It identifies orthogonal trait vectors in the model's representation space using contrastive activation analysis, enabling personality manipulation through vector algebra (addition, subtraction, scaling). PERSONA achieves performance comparable to fine-tuning on PersonalityBench and demonstrates strong results on a new Persona-Evolve benchmark for dynamic personality adaptation, suggesting that LLM personality is mathematically tractable.

Key Contribution

LLM personalities can be steered with fine-tuning-level precision, compositionality, and context-awareness, all without training, by directly manipulating activation vectors in representation space.

Abstract

Current methods for personality control in Large Language Models rely on static prompting or expensive fine-tuning, failing to capture the dynamic and compositional nature of human traits. We introduce PERSONA, a training-free framework that achieves fine-tuning level performance through direct manipulation of personality vectors in activation space. Our key insight is that personality traits appear as extractable, approximately orthogonal directions in the model's representation space that support algebraic operations. The framework operates through three stages: Persona-Base extracts orthogonal trait vectors via contrastive activation analysis; Persona-Algebra enables precise control through vector arithmetic (scalar multiplication for intensity, addition for composition, subtraction for suppression); and Persona-Flow achieves context-aware adaptation by dynamically composing these vectors during inference. On PersonalityBench, our approach achieves a mean score of 9.60, nearly matching the supervised fine-tuning upper bound of 9.61 without any gradient updates. On our proposed Persona-Evolve benchmark for dynamic personality adaptation, we achieve up to 91% win rates across diverse model families. These results provide evidence that aspects of LLM personality are mathematically tractable, opening new directions for interpretable and efficient behavioral control.

Interpretability & Mechanistic Interp Natural Language Processing

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

PERSONA: Dynamic and Compositional Inference-Time Personality Control via Activation Vector Algebra

Related Papers