Lattice AI Research

Research focus

Distributed Systems & Hardware (2)Inference & Quantization (2)Architecture Design (Transformers, SSMs, MoE) (1)Training Efficiency & Optimization (1)

Frequent co-authors

Bing Lu (2)Wenjing Huang (2)Dingwen Tao (2)Zedong Liu (1)

Papers (2)

May 13, 2026

May 13, 2026·also D Pareto candidate set

KVServe: Service-Aware KV Cache Compression for Communication-Efficient Disaggregated LLM Serving

Forget static KV cache compression – KVServe dynamically adapts compression strategies to your service context, slashing latency by up to 32.8x in disaggregated LLM serving.

Zedong Liu, Xinyang Ma, Dejun Luo +9

Architecture Design (Transformers, SSMs, MoE)Distributed Systems & Hardware Inference & Quantization

Apr 27, 2026

Apr 27, 2026·also ICT CAS, USTC

TACO: Efficient Communication Compression of Intermediate Tensors for Scalable Tensor-Parallel LLM Training

Squeezing intermediate tensors with FP8 quantization and adaptive transforms can nearly double the throughput of tensor-parallel LLM training without sacrificing accuracy.

Man Liu, Xingjian Tian, Bing Lu +6

Distributed Systems & Hardware Inference & Quantization Training Efficiency & Optimization

Search

Zheng Wei

Research focus

Frequent co-authors

Papers (2)