Xunzhuo Liu

MBZUAI 3 McGill University, Mila 5 AMD

Papers on Lattice

Total citations

Topics

h-index

Research focus

Distributed Systems & Hardware (4)Inference & Quantization (4)Scaling Laws & Emergent Abilities (1)Architecture Design (Transformers, SSMs, MoE) (1)

Frequent co-authors

Bowei He (3)Yuhan Liu (3)Junchen Jiang (3)Xue Liu (3)

Papers (4)

Apr 9, 2026

MilaApr 9, 2026·also MBZUAI, McGill, Steve

Dual-Pool Token-Budget Routing for Cost-Efficient and Reliable LLM Serving

Stop overpaying for LLM serving: intelligently routing requests to specialized pools based on token budget slashes GPU costs by up to 42% and dramatically improves reliability.

Xunzhuo Liu, Bowei He, Xue Liu +3

Distributed Systems & Hardware Inference & Quantization

Mar 18, 2026

Mar 18, 2026·also Mila, Charlie, McGill

The 1/W Law: An Analytical Study of Context-Length Routing Topology and GPU Generation Gains for LLM Inference Energy Efficiency

Forget buying new GPUs – clever context-length routing can boost your LLM inference energy efficiency by 2.5x, dwarfing the 1.7x gain from upgrading to a B200.

Huamin Chen, Xunzhuo Liu, Yuhan Liu +3

Distributed Systems & Hardware Inference & Quantization Scaling Laws & Emergent Abilities

Mar 17, 2026

MilaMar 17, 2026·also Charlie, McGill, Steve, UChicago

inference-fleet-sim: A Queueing-Theory-Grounded Fleet Capacity Planner for LLM Inference

Seemingly idle LLM inference fleets can be secretly broken, and this simulator helps you find out why before you buy.

Xunzhuo Liu, Yuhan Liu, Junchen Jiang +2

Distributed Systems & Hardware Inference & Quantization

MilaMar 17, 2026·also Charlie, McGill, Steve, UChicago

FleetOpt: Analytical Fleet Provisioning for LLM Inference with Compress-and-Route as Implementation Mechanism

LLM GPU fleets can be analytically optimized into a two-pool architecture with gateway-layer compression, slashing costs by up to 82% without sacrificing latency.

Xunzhuo Liu, Yuhan Liu, Junchen Jiang +2

Architecture Design (Transformers, SSMs, MoE)Distributed Systems & Hardware Inference & Quantization

Search

Xunzhuo Liu

Research focus

Frequent co-authors

Papers (4)