Jiebin Zhang

Papers on Lattice

Total citations

Topics

h-index

Publication activitypapers/week, last 8 weeks

Research focus

Inference & Quantization (3)Natural Language Processing (1)Scaling Laws & Emergent Abilities (1)RLHF & Preference Learning (1)

Frequent co-authors

Zhenghan Yu (2)Eugene J. Yu (2)Dawei Zhu (2)Yifan Song (2)

Papers (3)

Jul 16, 2026

Tianyu Liu +91w ago

D-cut: Adaptive Verification Depth Pruning for Batched Speculative Decoding

D-Cut transforms speculative decoding efficiency by cutting verification costs, achieving up to 3.0x speedup over traditional methods in high-concurrency scenarios.

Tianyu Liu, Yuhao Shen, Rui Cen +7

Inference & Quantization Natural Language Processing

Jun 1, 2026

Jun 1, 2026·also Key Laboratory of Computational, Tencent AI, UIUC

DFlare: Scaling Up Draft Capacity for Block Diffusion Speculative Decoding

DFlare achieves up to 5.52x speedup in LLM inference by allowing draft layers to independently leverage richer target knowledge, breaking through previous capacity constraints.

Jiebin Zhang, Zhenghan Yu, Eugene J. Yu +6

Inference & Quantization Scaling Laws & Emergent Abilities

Mar 2, 2026

Jiebin Zhang +7Mar 2, 2026·also Tsinghua AI, Key Laboratory of Computational, NUDT, PKU +1

Learning to Draft: Adaptive Speculative Decoding with Reinforcement Learning

Speculative decoding gets a throughput boost of up to 4.32x by using reinforcement learning to dynamically balance drafting and verification.

Jiebin Zhang, Zhenghan Yu, Eugene J. Yu +5

Inference & Quantization RLHF & Preference Learning

Search

Jiebin Zhang

Publication activitypapers/week, last 8 weeks

Research focus

Frequent co-authors

Papers (3)