Search papers, labs, and topics across Lattice.
1
0
3
0
FlashAttention-4 shatters attention bottlenecks on Blackwell GPUs, achieving up to 71% hardware utilization and 2.7x speedups over Triton, thanks to innovations like software-emulated softmax and asynchronous MMA pipelines.