Search papers, labs, and topics across Lattice.
1
0
3
16
Tile-based accelerators can now achieve near-peak utilization for attention layers thanks to FlatAttention, which slashes HBM traffic and outperforms even optimized GPU implementations.