Junlin Yang

Tsinghua AI

Papers on Lattice

Total citations

Topics

h-index

Research focus

RLHF & Preference Learning (1)Scalable Oversight & Alignment Theory (1)Training Efficiency & Optimization (1)

Frequent co-authors

Bingxiang He (1)Yuxin Zuo (1)Zeyuan Liu (1)Shangziqi Zhao (1)

Papers (1)

Mar 9, 2026

Tsinghua AIMar 9, 2026

How Far Can Unsupervised RLVR Scale LLM Training?

Intrinsic reward signals in unsupervised RL for LLMs inevitably collapse due to sharpening of the model's prior, but external rewards grounded in computational asymmetries offer a path to sustained scaling.

Bingxiang He, Yuxin Zuo, Zeyuan Liu +23

RLHF & Preference Learning Scalable Oversight & Alignment Theory Training Efficiency & Optimization

Search

Junlin Yang

Research focus

Frequent co-authors

Papers (1)