Search papers, labs, and topics across Lattice.
2
0
4
8
Sub-linear attention is now possible without sacrificing complete long-range dependency retention, thanks to learnable summary tokens that compress context.
Generative recommendation models like OneRec-V2 can achieve near-lossless FP8 quantization, unlocking significant latency and throughput improvements, unlike traditional recommender systems.