Ran He

Papers on Lattice

Total citations

Topics

h-index

Publication activitypapers/week, last 8 weeks

Research focus

Inference & Quantization (3)Training Efficiency & Optimization (3)Architecture Design (Transformers, SSMs, MoE) (1)Reasoning & Chain-of-Thought (1)

Frequent co-authors

Yuan Xu (2)Hejian Sang (2)Zhengze Zhou (2)Zhipeng Wang (2)

Papers (3)

Mar 11, 2026

Yuan Xu +4Mar 11, 2026

PACED: Distillation at the Frontier of Student Competence

Stop wasting compute on easy and impossible examples: PACED distillation focuses your student model's training on the sweet spot where it actually learns.

Yuan Xu, Hejian Sang, Zhengze Zhou +2

Inference & Quantization Training Efficiency & Optimization

Mar 6, 2026

FlashPrefill: Instantaneous Pattern Discovery and Thresholding for Ultra-Fast Long-Context Prefilling

Forget slow attention: FlashPrefill achieves a staggering 27x speedup in long-context prefilling by instantly discovering and thresholding sparse attention patterns.

Qihang Fan, Qihang Fan, Zhiying Wu +3

Architecture Design (Transformers, SSMs, MoE)Inference & Quantization Training Efficiency & Optimization

Mar 5, 2026

Hejian Sang +9Mar 5, 2026

On-Policy Self-Distillation for Reasoning Compression

Reasoning models aren't just verbose, they're actively *harmed* by their own verbosity, but a simple self-distillation trick can compress their outputs by up to 59% while boosting accuracy by up to 16 points.

Hejian Sang, Yuan Xu, Yuanda Xu +7

Inference & Quantization Reasoning & Chain-of-Thought Training Efficiency & Optimization

Search

Ran He

Publication activitypapers/week, last 8 weeks

Research focus

Frequent co-authors

Papers (3)