Search papers, labs, and topics across Lattice.
SELF-EMO is introduced, a self-evolution framework for improving emotional understanding and expression in LLMs by having the model act as both emotion recognizer and dialogue responder in a role-based self-play paradigm. This framework generates diverse conversational trajectories for scalable data generation, using a data flywheel mechanism with IoU-based rewards to filter and refine samples for continuous self-improvement. The method incorporates SELF-GRPO, a reinforcement learning algorithm that stabilizes optimization with multi-label alignment rewards and group-level consistency signals, achieving state-of-the-art results on IEMOCAP, MELD, and EmoryNLP.
LLMs can significantly boost their emotional intelligence simply by role-playing conversations with themselves, iteratively refining their ability to both recognize and express emotions.
Emotion Recognition in Conversation (ERC) has become a fundamental capability for large language models (LLMs) in human-centric interaction. Beyond accurate recognition, coherent emotional expression is also crucial, yet both are limited by the scarcity and static nature of high-quality annotated data. In this work, we propose SELF-EMO, a self-evolution framework grounded in the hypothesis that better emotion prediction leads to more consistent emotional responses. We introduce two auxiliary tasks, emotional understanding and emotional expression, and design a role-based self-play paradigm where the model acts as both an emotion recognizer and a dialogue responder. Through iterative interactions, the model generates diverse conversational trajectories, enabling scalable data generation. To ensure quality, we adopt a data flywheel mechanism that filters candidate predictions and responses using a smoothed IoU-based reward and feeds selected samples back for continuous self-improvement without external supervision. We further develop SELF-GRPO, a reinforcement learning algorithm that stabilizes optimization with multi-label alignment rewards and group-level consistency signals. Experiments on IEMOCAP, MELD, and EmoryNLP show that SELF-EMO achieves state-of-the-art performance, improving accuracy by +6.33% on Qwen3-4B and +8.54% on Qwen3-8B, demonstrating strong effectiveness and generalization.