Jianbin Zheng

Papers on Lattice

Total citations

Topics

h-index

Research focus

Eval Frameworks & Benchmarks (1)Reasoning & Chain-of-Thought (1)Training Efficiency & Optimization (1)RLHF & Preference Learning (1)

Frequent co-authors

Zixuan Huang (2)Xin Xia (2)Yuxi Ren (2)Hongyan Xie (2)

Papers (2)

Feb 9, 2026

Feb 9, 2026·also UQ

Does Your Reasoning Model Implicitly Know When to Stop Thinking?

LRMs already know when to stop reasoning, but current sampling methods are holding them back.

Zixuan Huang, Xin Xia, Yuxi Ren +11

Eval Frameworks & Benchmarks Reasoning & Chain-of-Thought Training Efficiency & Optimization

Jan 30, 2026

Zixuan Huang +12Jan 30, 2026·also UQ

Real-Time Aligned Reward Model beyond Semantics

Stop overfitting your reward model: R2M leverages real-time policy feedback to dynamically align the reward model with the evolving policy distribution, reducing reward overoptimization in RLHF.

Zixuan Huang, Xin Xia, Yuxi Ren +10

RLHF & Preference Learning

Search

Jianbin Zheng

Research focus

Frequent co-authors

Papers (2)