Yizhou Sun

Papers on Lattice

Total citations

Topics

h-index

Research focus

Architecture Design (Transformers, SSMs, MoE) (2)Inference & Quantization (2)Natural Language Processing (2)

Frequent co-authors

Zihao Xu (1)John Harvill (1)John Harvill (1)Ziwei Fan (1)

Papers (2)

Apr 16, 2026

Amazon ScienceApr 16, 2026·also JHU, UIUC

Compressing Sequences in the Latent Embedding Space: $K$-Token Merging for Large Language Models

Achieve 75% input length reduction in LLMs with minimal performance loss by compressing token embeddings directly in the latent space.

Zihao Xu, John Harvill, John Harvill +4

Architecture Design (Transformers, SSMs, MoE)Inference & Quantization Natural Language Processing

Mar 9, 2026

ConFu: Contemplate the Future for Better Speculative Sampling

By enabling draft models to "contemplate the future," ConFu achieves significant speedups in speculative decoding, outperforming EAGLE-3 by 8-11% on Llama-3 models.

Zongyue Qin, Raghavv Goel, Mukul Gagrani +3

Architecture Design (Transformers, SSMs, MoE)Inference & Quantization Natural Language Processing

Search

Yizhou Sun

Research focus

Frequent co-authors

Papers (2)