Han Tian

University of Science and Technology of China Abstract. Meeting stringent Time-To-First-Token (TTFT) requirements is crucial for LLM applications. To improve efficiency, modern LLM serving systems adopt disaggregated architectures with diverse parallelisms, introducing complex multi-stage workflows involving reusable KV-block retrieval, collective communication, and P

Papers on Lattice

Total citations

Topics

h-index

Research focus

Architecture Design (Transformers, SSMs, MoE) (1)Distributed Systems & Hardware (1)Inference & Quantization (1)

Frequent co-authors

Yijun Sun (1)Xudong Liao (1)Songrun Xie (1)Songru Xie (1)

Papers (1)

Mar 18, 2026

Multi-stage Flow Scheduling for LLM Serving

LLM serving systems can boost Time-To-First-Token (TTFT) attainment by up to 2.4x simply by prioritizing network flows based on a novel approximation of Least-Laxity-First scheduling.

Yijun Sun, Xudong Liao, Songrun Xie +10

Architecture Design (Transformers, SSMs, MoE)Distributed Systems & Hardware Inference & Quantization

Search

Han Tian

Research focus

Frequent co-authors

Papers (1)