Search papers, labs, and topics across Lattice.
1
0
3
Multi-Head Low-Rank Attention (MLRA) unlocks 2.8x faster distributed decoding by enabling partitionable latent states, overcoming the sharding bottleneck of previous latent attention methods.