Search papers, labs, and topics across Lattice.
4
0
7
16
Decoupling LLM prefill and decode across datacenters is now practical, unlocking independent scaling and resource elasticity, thanks to a system that combines KV-efficient models with intelligent request scheduling.
Ditch static data paths: TENT dynamically slices and sprays LLM data across heterogeneous interconnects, self-healing in under 50ms and boosting throughput by up to 36%.
Unstable explanations plague ML models on spectroscopy data, but SHAPCA offers a more consistent and interpretable approach by combining PCA and SHAP values in the original input space.
Double your LLM inference throughput by routing KV-cache through decoding engines to bypass the bandwidth bottleneck on prefill engines.