Haichen Zhang

Papers on Lattice

Total citations

Topics

h-index

Publication activitypapers/week, last 8 weeks

Research focus

Distributed Systems & Hardware (1)Inference & Quantization (1)

Frequent co-authors

Xunzhuo Liu (1)Bowei He (1)Xue Liu (1)Andy Luo (1)

Papers (1)

Apr 9, 2026

Mila1w ago·also McGill

Dual-Pool Token-Budget Routing for Cost-Efficient and Reliable LLM Serving

Stop overpaying for LLM serving: intelligently routing requests to specialized pools based on token budget slashes GPU costs by up to 42% and dramatically improves reliability.

Xunzhuo Liu, Bowei He, Xue Liu +3

Distributed Systems & Hardware Inference & Quantization

Search

Haichen Zhang

Publication activitypapers/week, last 8 weeks

Research focus

Frequent co-authors

Papers (1)