Search papers, labs, and topics across Lattice.
4
2
4
13
K-means gets a 17.9x speed boost on modern GPUs thanks to a clever redesign that avoids memory bottlenecks and atomic write contention.
Trainable INT8 attention can match full-precision attention during pre-training, but only if you normalize QK and reduce tokens per step.
SpargeAttention2 achieves 95% attention sparsity in video diffusion models with a 16.2x speedup, proving that trainable sparse attention can significantly outperform training-free methods without sacrificing generation quality.
Achieve an 18.6x speedup in video diffusion models with 97% attention sparsity by learning how to route and combine sparse and linear attention, outperforming heuristic approaches.