Bowei He

MBZUAI 3 McGill University

Papers on Lattice

Total citations

Topics

h-index

Publication activitypapers/week, last 8 weeks

Research focus

Inference & Quantization (4)Distributed Systems & Hardware (3)Computer Vision (1)Recommendation & Information Retrieval (1)

Frequent co-authors

Xunzhuo Liu (3)Yuhan Liu (2)Junchen Jiang (2)Xue Liu (2)

Papers (4)

Apr 9, 2026

MilaApr 9, 2026·also McGill

Dual-Pool Token-Budget Routing for Cost-Efficient and Reliable LLM Serving

Stop overpaying for LLM serving: intelligently routing requests to specialized pools based on token budget slashes GPU costs by up to 42% and dramatically improves reliability.

Xunzhuo Liu, Bowei He, Xue Liu +3

Distributed Systems & Hardware Inference & Quantization

Apr 1, 2026

Arina Kharlamova +1Apr 1, 2026·also McGill

Learning Quantised Structure-Preserving Motion Representations for Dance Fingerprinting

DANCEMATCH enables efficient large-scale dance retrieval by creating compact, discrete motion signatures that capture the spatio-temporal structure of dance, moving beyond continuous embeddings.

Arina Kharlamova, Bowei He

Computer Vision Inference & Quantization Recommendation & Information Retrieval

Mar 17, 2026

MilaMar 17, 2026·also Charlie, McGill, UChicago

inference-fleet-sim: A Queueing-Theory-Grounded Fleet Capacity Planner for LLM Inference

Seemingly idle LLM inference fleets can be secretly broken, and this simulator helps you find out why before you buy.

Xunzhuo Liu, Yuhan Liu, Junchen Jiang +2

Distributed Systems & Hardware Inference & Quantization

MilaMar 17, 2026·also Charlie, McGill, UChicago

FleetOpt: Analytical Fleet Provisioning for LLM Inference with Compress-and-Route as Implementation Mechanism

LLM GPU fleets can be analytically optimized into a two-pool architecture with gateway-layer compression, slashing costs by up to 82% without sacrificing latency.