Search papers, labs, and topics across Lattice.
3
1
6
4
Trainable INT8 attention can match full-precision attention during pre-training, but only if you normalize QK and reduce tokens per step.
LLMs can now design GPU kernels that outperform both human experts and prior automated methods, thanks to a co-evolving world model that guides the search process.
Achieve an 18.6x speedup in video diffusion models with 97% attention sparsity by learning how to route and combine sparse and linear attention, outperforming heuristic approaches.