Search papers, labs, and topics across Lattice.
3
11
4
12
K-means gets a 17.9x speed boost on modern GPUs thanks to a clever redesign that avoids memory bottlenecks and atomic write contention.
SpargeAttention2 achieves 95% attention sparsity in video diffusion models with a 16.2x speedup, proving that trainable sparse attention can significantly outperform training-free methods without sacrificing generation quality.
Forget sparse KV caches – QuantSpec's hierarchical 4-bit quantization unlocks 2.5x speedups in long-context LLM inference with >90% acceptance rates.