Search papers, labs, and topics across Lattice.
2
0
4
0
Unlock 2x faster LLM serving and slash warmup times by fusing kernels that gracefully handle dynamic shapes and data dependencies.
K-means, previously relegated to offline processing, gets a 17.9x speed boost on modern GPUs thanks to Flash-KMeans' clever IO and contention optimizations.