Search papers, labs, and topics across Lattice.
4
1
7
14
Trainable INT8 attention can match full-precision attention during pre-training, but only if you normalize QK and reduce tokens per step.
LLM-driven program evolution gets a smart upgrade: AdaEvolve dynamically allocates resources to promising solution candidates, leaving static schedules in the dust.
LLMs can now design GPU kernels that outperform both human experts and prior automated methods, thanks to a co-evolving world model that guides the search process.
Achieve an 18.6x speedup in video diffusion models with 97% attention sparsity by learning how to route and combine sparse and linear attention, outperforming heuristic approaches.