Search papers, labs, and topics across Lattice.
University of Chicago
3
3
4
6
Forget buying new GPUs – clever context-length routing can boost your LLM inference energy efficiency by 2.5x, dwarfing the 1.7x gain from upgrading to a B200.
Seemingly idle LLM inference fleets can be secretly broken, and this simulator helps you find out why before you buy.
LLM GPU fleets can be analytically optimized into a two-pool architecture with gateway-layer compression, slashing costs by up to 82% without sacrificing latency.