Tsinghua AIMcGillMay 25, 2026arXiv:2605.25511

CRPO: Character-centric Group Relative Policy Optimization for Role-aware Reasoning in Role-playing Agents

Yihong Tang, Kehai Chen, Liang Yue, Benyou Wang, Min Zhang

AI Summary

This paper introduces Character-Centric Group Relative Policy Optimization (CRPO) to address the issue of character fidelity loss and style collapse when applying reinforcement learning methods like GRPO to role-playing agents. CRPO decouples task logic from stylistic rewards, dynamically adapts optimization constraints based on character complexity, and uses generic responses as negative baselines. Experiments show CRPO significantly improves character consistency and emotional expression compared to existing methods.

Key Contribution

RL fine-tuning can make your role-playing agent *worse* at embodying its character, unless you carefully balance task rewards with stylistic constraints.

Abstract

Recent advancements in Reinforcement Learning (RL), particularly Group Relative Policy Optimization (GRPO), have significantly enhanced the reasoning capabilities of Large Language Models. However, applying these problem-centric optimization methods to role-playing agents often leads to a loss of character fidelity and style collapse, as they prioritize context-specific utility over persona alignment. To address this, we propose Character-Centric Group Relative Policy Optimization (CRPO), a framework designed to realign RL objectives with the role-playing task. CRPO improves character distinctiveness through three mechanisms: decoupling task logic from stylistic rewards to resolve gradient conflicts, dynamically adapting optimization constraints based on character complexity, and utilizing generic responses as negative baselines to prevent the model from reverting to a common distribution. Extensive experiments demonstrate that CRPO outperforms existing methods in consistency, emotion and others.

Reasoning & Chain-of-Thought RLHF & Preference Learning Tool Use & Agents

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

CRPO: Character-centric Group Relative Policy Optimization for Role-aware Reasoning in Role-playing Agents

Related Papers