Mingxing Zhang

Papers on Lattice

Total citations

Topics

h-index

Publication activitypapers/week, last 8 weeks

Research focus

Distributed Systems & Hardware (3)Inference & Quantization (3)Architecture Design (Transformers, SSMs, MoE) (2)Training Efficiency & Optimization (1)

Frequent co-authors

Ruoyu Qin (1)Weiran He (1)Yaoyu Wang (1)Zheming Li (1)

Papers (4)

Apr 16, 2026

Apr 16, 2026·also Tsinghua AI

Prefill-as-a-Service: KVCache of Next-Generation Models Could Go Cross-Datacenter

Decoupling LLM prefill and decode across datacenters is now practical, unlocking independent scaling and resource elasticity, thanks to a system that combines KV-efficient models with intelligent request scheduling.

Ruoyu Qin, Weiran He, Yaoyu Wang +5

Architecture Design (Transformers, SSMs, MoE)Distributed Systems & Hardware Inference & Quantization

Apr 1, 2026

Independent ResearcherApr 1, 2026·also DAMO, Tsinghua AI, Ant Group, Moonshot +1

TENT: A Declarative Slice Spraying Engine for Performant and Resilient Data Movement in Disaggregated LLM Serving

Ditch static data paths: TENT dynamically slices and sprays LLM data across heterogeneous interconnects, self-healing in under 50ms and boosting throughput by up to 36%.

Yineng Zhang, Yuhao Fu, Mingxing Zhang

Architecture Design (Transformers, SSMs, MoE)Distributed Systems & Hardware Inference & Quantization+1

Mar 19, 2026

Mingxing Zhang +11Mar 19, 2026

SHAPCA: Consistent and Interpretable Explanations for Machine Learning Models on Spectroscopy Data

Unstable explanations plague ML models on spectroscopy data, but SHAPCA offers a more consistent and interpretable approach by combining PCA and SHAP values in the original input space.

Mingxing Zhang, Mingxin Zhang, Nicola Rossberg +9

Interpretability & Mechanistic Interp Scientific Discovery & Drug Design

Feb 25, 2026

Feb 25, 2026·also PKU

DualPath: Breaking the Storage Bandwidth Bottleneck in Agentic LLM Inference

Double your LLM inference throughput by routing KV-cache through decoding engines to bypass the bandwidth bottleneck on prefill engines.

Yongtong Wu, Shaoyuan Chen, Yinmin Zhong +10

Distributed Systems & Hardware Inference & Quantization Tool Use & Agents

Search

Mingxing Zhang

Publication activitypapers/week, last 8 weeks

Research focus

Frequent co-authors

Papers (4)