Yingbin Liang

LLMs can now reliably follow complex, hierarchical instructions thanks to a new constrained RL framework that treats system prompts as strict algorithmic boundaries.

Keru Chen, Sen Lin, Yingbin Liang +3

Constitutional AI & AI Ethics RLHF & Preference Learning

Mar 3, 2026

Why Adam Can Beat SGD: Second-Moment Normalization Yields Sharper Tails

Adam's faster convergence isn't just empirical luck: its second-moment normalization provably yields sharper tails in high-probability convergence guarantees compared to SGD.

Yingbin Liang

Training Efficiency & Optimization

Feb 16, 2026

CMU MLFeb 16, 2026·also Ohio State, UIUC

On the Learning Dynamics of RLVR at the Edge of Competence

RLVR's success in long-horizon reasoning hinges on a smooth difficulty spectrum, where mastering easier sub-problems unlocks the ability to tackle harder ones, avoiding frustrating grokking plateaus.

Yuejie Chi, Yuting Wei, Yingbin Liang

Architecture Design (Transformers, SSMs, MoE)Reasoning & Chain-of-Thought RLHF & Preference Learning

Search

Yingbin Liang

Publication activitypapers/week, last 8 weeks

Research focus

Frequent co-authors

Papers (5)