Search papers, labs, and topics across Lattice.
1
0
3
0
Late-interaction retrieval just got a whole lot faster and cheaper: Flash-MaxSim slashes memory usage by 16x and speeds up inference by 4.7x on an H100 by ditching the massive similarity tensor.