Xinyang Ma

Papers on Lattice

Total citations

Topics

h-index

Research focus

Architecture Design (Transformers, SSMs, MoE) (1)Distributed Systems & Hardware (1)Inference & Quantization (1)

Frequent co-authors

Zedong Liu (1)Dejun Luo (1)Hairui Zhao (1)Bing Lu (1)

Papers (1)

May 13, 2026

May 13, 2026·also D Pareto candidate set

KVServe: Service-Aware KV Cache Compression for Communication-Efficient Disaggregated LLM Serving

Forget static KV cache compression – KVServe dynamically adapts compression strategies to your service context, slashing latency by up to 32.8x in disaggregated LLM serving.

Zedong Liu, Xinyang Ma, Dejun Luo +9

Architecture Design (Transformers, SSMs, MoE)Distributed Systems & Hardware Inference & Quantization

Search

Xinyang Ma

Research focus

Frequent co-authors

Papers (1)