Search papers, labs, and topics across Lattice.
NVIDIA
1
0
2
3
MSA slashes per-token attention compute by over 28x while maintaining competitive performance, revolutionizing how LLMs can handle ultra-long contexts.