Zhuokai Zhao

Stronger coding agents can achieve higher success rates while requiring fewer user interventions, reshaping our understanding of effective coding assistance.

Yifan Wu, Zhuokai Zhao, Songlin Li +7

Code Generation & Program Synthesis Eval Frameworks & Benchmarks Tool Use & Agents

Jun 17, 2026

Meta AI3w ago·also DeepMind

SAGE-OPD: Selective Agent-Guided Intervention for Multi-Turn On-Policy Distillation

Selective teacher intervention in multi-turn training can boost agent performance by over 13% by mitigating the impact of early errors.

Yifan Wu, Jiayi Liu, Xiangjun Fan +1

RLHF & Preference Learning Training Efficiency & Optimization

May 31, 2026

Meta AIMay 31, 2026

OmniOPD: Logit-Free On-Policy Distillation via Speculative Verification

Chunk-level semantic verification in OmniOPD yields a +28.64% boost in math performance over traditional OPD, challenging the reliance on token-level logit matching.

Yuhang Zhou, Yifan Wu, Mingyi Wang +4

Inference & Quantization RLHF & Preference Learning Training Efficiency & Optimization

Apr 6, 2026

Yuhang Zhou +6Apr 6, 2026·also Meta AI

Synthetic Sandbox for Training Machine Learning Engineering Agents

On-policy RL for machine learning engineering agents is now practical, thanks to a synthetic sandbox that slashes execution time by 13x while boosting performance by up to 67%.

Yuhang Zhou, Lizhu Zhang, Yifan Wu +4

Code Generation & Program Synthesis Eval Frameworks & Benchmarks Tool Use & Agents

Mar 19, 2026

Arushi Rai +6Mar 19, 2026·also Meta AI

TARo: Token-level Adaptive Routing for LLM Test-time Alignment

Achieve significant reasoning gains in frozen LLMs (+22.4%) without retraining by adaptively routing reward model guidance at the token level during inference.

Arushi Rai, Qiang Zhang, Hanqing Zeng +4

Inference & Quantization Reasoning & Chain-of-Thought RLHF & Preference Learning

Search

Zhuokai Zhao

Publication activitypapers/week, last 8 weeks

Research focus

Frequent co-authors

Papers (6)