Search papers, labs, and topics across Lattice.
Moonshot AI
1
0
3
4
Decoupling LLM prefill and decode across datacenters is now practical, unlocking independent scaling and resource elasticity, thanks to a system that combines KV-efficient models with intelligent request scheduling.