Huamin Chen

Papers on Lattice

Total citations

Topics

h-index

Publication activitypapers/week, last 8 weeks

Research focus

Distributed Systems & Hardware (2)Inference & Quantization (2)Scaling Laws & Emergent Abilities (1)

Frequent co-authors

Xunzhuo Liu (2)Bowei He (1)Xue Liu (1)Andy Luo (1)

Papers (2)

Apr 9, 2026

MilaApr 9, 2026·also McGill

Dual-Pool Token-Budget Routing for Cost-Efficient and Reliable LLM Serving

Stop overpaying for LLM serving: intelligently routing requests to specialized pools based on token budget slashes GPU costs by up to 42% and dramatically improves reliability.

Xunzhuo Liu, Bowei He, Xue Liu +3

Distributed Systems & Hardware Inference & Quantization

Mar 18, 2026

Mar 18, 2026·also Mila, Charlie, McGill

The 1/W Law: An Analytical Study of Context-Length Routing Topology and GPU Generation Gains for LLM Inference Energy Efficiency

Forget buying new GPUs – clever context-length routing can boost your LLM inference energy efficiency by 2.5x, dwarfing the 1.7x gain from upgrading to a B200.

Huamin Chen, Xunzhuo Liu, Yuhan Liu +3

Distributed Systems & Hardware Inference & Quantization Scaling Laws & Emergent Abilities

Search

Huamin Chen

Publication activitypapers/week, last 8 weeks

Research focus

Frequent co-authors

Papers (2)