Search papers, labs, and topics across Lattice.
1
0
3
Tucker Attention squeezes an order of magnitude more parameter efficiency out of attention layers, while unifying and simplifying Group Query Attention, Multi-Head Latent Attention, and standard Multi-Head Attention.