Zheming Li

Moonshot AI

Papers on Lattice

Total citations

Topics

h-index

Research focus

Architecture Design (Transformers, SSMs, MoE) (1)Distributed Systems & Hardware (1)Inference & Quantization (1)

Frequent co-authors

Ruoyu Qin (1)Weiran He (1)Yaoyu Wang (1)Xinran Xu (1)

Papers (1)

Apr 16, 2026

Apr 16, 2026·also Tsinghua AI

Prefill-as-a-Service: KVCache of Next-Generation Models Could Go Cross-Datacenter

Decoupling LLM prefill and decode across datacenters is now practical, unlocking independent scaling and resource elasticity, thanks to a system that combines KV-efficient models with intelligent request scheduling.

Ruoyu Qin, Weiran He, Yaoyu Wang +5

Architecture Design (Transformers, SSMs, MoE)Distributed Systems & Hardware Inference & Quantization

Search

Zheming Li

Research focus

Frequent co-authors

Papers (1)