Zhengze Zhou

Papers on Lattice

Total citations

Topics

h-index

Publication activitypapers/week, last 8 weeks

Research focus

Reasoning & Chain-of-Thought (3)RLHF & Preference Learning (2)Inference & Quantization (2)Training Efficiency & Optimization (2)

Frequent co-authors

Yuanda Xu (4)Hejian Sang (4)Zhipeng Wang (4)Ran He (2)

Papers (4)

Jul 6, 2026

Yuanda Xu +132w ago·also LinkedIn Corporation

TREK: Distill to Explore, Reinforce to Refine

TREK transforms the way models tackle challenging prompts by expanding their exploration support, leading to substantial performance gains even in the hardest task scenarios.

Yuanda Xu, Zhengze Zhou, Kayhan Behdin +11

Reasoning & Chain-of-Thought RLHF & Preference Learning

Mar 11, 2026

Yuanda Xu +4Mar 11, 2026

PACED: Distillation at the Frontier of Student Competence

Stop wasting compute on easy and impossible examples: PACED distillation focuses your student model's training on the sweet spot where it actually learns.

Yuanda Xu, Hejian Sang, Zhengze Zhou +2

Inference & Quantization Training Efficiency & Optimization

Mar 5, 2026

Hejian Sang +6Mar 5, 2026

On-Policy Self-Distillation for Reasoning Compression

Reasoning models aren't just verbose, they're actively *harmed* by their own verbosity, but a simple self-distillation trick can compress their outputs by up to 59% while boosting accuracy by up to 16 points.

Hejian Sang, Yuanda Xu, Zhengze Zhou +4

Inference & Quantization Reasoning & Chain-of-Thought Training Efficiency & Optimization

Feb 24, 2026

Overconfident Errors Need Stronger Correction: Asymmetric Confidence Penalties for Reinforcement Learning

Overconfident errors in RLVR monopolize probability mass and suppress exploration, but a confidence-aware penalty fixes this and boosts mathematical reasoning performance.