Zeyuan Liu

Tsinghua Shenzhen International Graduate School, Tsinghua University

Papers on Lattice

Total citations

Topics

Publication activitypapers/week, last 8 weeks

Research focus

RLHF & Preference Learning (2)Scalable Oversight & Alignment Theory (1)Training Efficiency & Optimization (1)Tool Use & Agents (1)

Frequent co-authors

Bingxiang He (1)Bingxiang He (1)Yuxin Zuo (1)Shangziqi Zhao (1)

Papers (2)

Mar 9, 2026

Tsinghua AIMar 9, 2026

How Far Can Unsupervised RLVR Scale LLM Training?

Intrinsic reward signals in unsupervised RL for LLMs inevitably collapse due to sharpening of the model's prior, but external rewards grounded in computational asymmetries offer a path to sustained scaling.

Bingxiang He, Bingxiang He, Yuxin Zuo +30

RLHF & Preference Learning Scalable Oversight & Alignment Theory Training Efficiency & Optimization

Feb 26, 2026

Tsinghua AIFeb 26, 2026·also Microsoft Research, Beihang

Exploratory Memory-Augmented LLM Agent via Hybrid On- and Off-Policy Optimization

LLM agents can learn to explore novel states and generalize to new tasks with a hybrid on- and off-policy RL framework that leverages memory.

Zeyuan Liu, Zeyuan Liu, Jeonghye Kim +4

RLHF & Preference Learning Tool Use & Agents World Models & Planning

Search

Zeyuan Liu

Publication activitypapers/week, last 8 weeks

Research focus

Frequent co-authors

Papers (2)