Search papers, labs, and topics across Lattice.
2
0
6
2
Autonomous coding agents can now outperform expert-engineered attention kernels on NVIDIA's latest Blackwell GPUs, discovering optimizations that eluded human experts.
FlashAttention-4 shatters attention bottlenecks on Blackwell GPUs, achieving up to 71% hardware utilization and 2.7x speedups over Triton, thanks to innovations like software-emulated softmax and asynchronous MMA pipelines.