Yuhan Liu

University of Chicago

Papers on Lattice

Total citations

Topics

h-index

Research focus

Distributed Systems & Hardware (3)Inference & Quantization (3)Scaling Laws & Emergent Abilities (1)Architecture Design (Transformers, SSMs, MoE) (1)

Frequent co-authors

Xunzhuo Liu (3)Junchen Jiang (3)Xue Liu (3)Bowei He (2)

Papers (3)

Mar 18, 2026

Mar 18, 2026·also Mila, Charlie, McGill

The 1/W Law: An Analytical Study of Context-Length Routing Topology and GPU Generation Gains for LLM Inference Energy Efficiency

Forget buying new GPUs – clever context-length routing can boost your LLM inference energy efficiency by 2.5x, dwarfing the 1.7x gain from upgrading to a B200.

Huamin Chen, Xunzhuo Liu, Yuhan Liu +3

Distributed Systems & Hardware Inference & Quantization Scaling Laws & Emergent Abilities

Mar 17, 2026

MilaMar 17, 2026·also Charlie, McGill, UChicago

inference-fleet-sim: A Queueing-Theory-Grounded Fleet Capacity Planner for LLM Inference

Seemingly idle LLM inference fleets can be secretly broken, and this simulator helps you find out why before you buy.

Xunzhuo Liu, Yuhan Liu, Junchen Jiang +2

Distributed Systems & Hardware Inference & Quantization

MilaMar 17, 2026·also Charlie, McGill, UChicago

FleetOpt: Analytical Fleet Provisioning for LLM Inference with Compress-and-Route as Implementation Mechanism

LLM GPU fleets can be analytically optimized into a two-pool architecture with gateway-layer compression, slashing costs by up to 82% without sacrificing latency.

Xunzhuo Liu, Yuhan Liu, Junchen Jiang +2

Architecture Design (Transformers, SSMs, MoE)Distributed Systems & Hardware Inference & Quantization

Search

Yuhan Liu

Research focus

Frequent co-authors

Papers (3)