Yuxin Zuo

LLMs can achieve massive performance gains on reasoning and knowledge-intensive tasks simply by iteratively refining their answers using pseudo-labels derived from unlabeled data.

Wenxuan Jiang, Yuxin Zuo, Xuecheng Wu +6

Reasoning & Chain-of-Thought RLHF & Preference Learning Tool Use & Agents

Mar 9, 2026

Tsinghua AIMar 9, 2026

How Far Can Unsupervised RLVR Scale LLM Training?

Intrinsic reward signals in unsupervised RL for LLMs inevitably collapse due to sharpening of the model's prior, but external rewards grounded in computational asymmetries offer a path to sustained scaling.

Bingxiang He, Bingxiang He, Yuxin Zuo +30

RLHF & Preference Learning Scalable Oversight & Alignment Theory Training Efficiency & Optimization

Search

Yuxin Zuo

Publication activitypapers/week, last 8 weeks

Research focus

Frequent co-authors

Papers (3)