Microsoft ResearchCUHKMar 10, 2026arXiv:2603.09249

Social-R1: Towards Human-like Social Reasoning in LLMs

Jincenzi Wu, Yuxuan Lei, Jianxun Lian, Yitian Huang, Lexin Zhou, Haotian Li, Helen M. Meng

AI Summary

The paper introduces Social-R1, a reinforcement learning framework designed to improve social reasoning in LLMs by aligning model reasoning processes with human cognition. Social-R1 uses multi-dimensional rewards to supervise the entire reasoning trajectory, enforcing structural alignment, logical integrity, and information density. Experiments demonstrate that a 4B parameter model trained with Social-R1 and a new adversarial benchmark, ToMBench-Hard, outperforms larger models and generalizes well across various social reasoning benchmarks.

Key Contribution

A 4B parameter model can now beat much larger models at social reasoning, thanks to a new RL framework that aligns model reasoning trajectories with human cognition.

Abstract

While large language models demonstrate remarkable capabilities across numerous domains, social intelligence - the capacity to perceive social cues, infer mental states, and generate appropriate responses - remains a critical challenge, particularly for enabling effective human-AI collaboration and developing AI that truly serves human needs. Current models often rely on superficial patterns rather than genuine social reasoning. We argue that cultivating human-like social intelligence requires training with challenging cases that resist shortcut solutions. To this end, we introduce ToMBench-Hard, an adversarial benchmark designed to provide hard training examples for social reasoning. Building on this, we propose Social-R1, a reinforcement learning framework that aligns model reasoning with human cognition through multi-dimensional rewards. Unlike outcome-based RL, Social-R1 supervises the entire reasoning process, enforcing structural alignment, logical integrity, and information density. Results show that our approach enables a 4B parameter model to surpass much larger counterparts and generalize robustly across eight diverse benchmarks. These findings demonstrate that challenging training cases with trajectory-level alignment offer a path toward efficient and reliable social intelligence.

Constitutional AI & AI Ethics Natural Language Processing Reasoning & Chain-of-Thought

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Social-R1: Towards Human-like Social Reasoning in LLMs

Related Papers