Wenjing Huang

Institute of Computing Technology, Chinese Academy of Sciences

Papers on Lattice

Total citations

Topics

h-index

Research focus

Distributed Systems & Hardware (3)Inference & Quantization (2)Training Efficiency & Optimization (2)Architecture Design (Transformers, SSMs, MoE) (1)

Frequent co-authors

Zedong Liu (2)Hairui Zhao (2)Bing Lu (2)Yida Gu (2)

Papers (3)

May 13, 2026

May 13, 2026·also D Pareto candidate set

KVServe: Service-Aware KV Cache Compression for Communication-Efficient Disaggregated LLM Serving

Forget static KV cache compression – KVServe dynamically adapts compression strategies to your service context, slashing latency by up to 32.8x in disaggregated LLM serving.

Zedong Liu, Xinyang Ma, Dejun Luo +9

Architecture Design (Transformers, SSMs, MoE)Distributed Systems & Hardware Inference & Quantization

Apr 27, 2026

Apr 27, 2026·also ICT CAS, USTC

TACO: Efficient Communication Compression of Intermediate Tensors for Scalable Tensor-Parallel LLM Training

Squeezing intermediate tensors with FP8 quantization and adaptive transforms can nearly double the throughput of tensor-parallel LLM training without sacrificing accuracy.

Man Liu, Xingjian Tian, Bing Lu +6

Distributed Systems & Hardware Inference & Quantization Training Efficiency & Optimization

Jan 28, 2026

Jan 28, 2026·also Ant Group, Jilin

CCL-D: A High-Precision Diagnostic System for Slow and Hang Anomalies in Large-Scale Model Training

Cut your debugging time: CCL-D slashes the diagnosis time for slow/hang anomalies in large-scale distributed training from days to just 6 minutes.

Yida Gu, Fakang Wang, Jianhao Fu +17

Distributed Systems & Hardware Training Efficiency & Optimization

Search

Wenjing Huang

Research focus

Frequent co-authors

Papers (3)