Search papers, labs, and topics across Lattice.
1
0
3
4
MLRA unlocks 2.8x faster LLM decoding by enabling efficient tensor parallelism for latent attention, sidestepping the memory traffic bottlenecks that plague existing methods.