Search papers, labs, and topics across Lattice.
1
0
3
4
Forget full attention: a hybrid sparse-linear attention model, MiniCPM-SALA, achieves 3.5x faster inference and supports 1M context length on a single GPU, all while maintaining comparable performance.