Search papers, labs, and topics across Lattice.
MBZUAI 3 McGill University
4
3
5
4
Stop overpaying for LLM serving: intelligently routing requests to specialized pools based on token budget slashes GPU costs by up to 42% and dramatically improves reliability.
DANCEMATCH enables efficient large-scale dance retrieval by creating compact, discrete motion signatures that capture the spatio-temporal structure of dance, moving beyond continuous embeddings.
Seemingly idle LLM inference fleets can be secretly broken, and this simulator helps you find out why before you buy.
LLM GPU fleets can be analytically optimized into a two-pool architecture with gateway-layer compression, slashing costs by up to 82% without sacrificing latency.