Search papers, labs, and topics across Lattice.
2
0
4
0
YOCO++ proves you can halve the KV cache size in LLMs and still beat a standard Transformer, thanks to a clever residual connection trick.
Generative recommendation models can amplify popularity bias due to imbalanced tokenization, but a simple codebook rebalancing strategy can significantly improve performance.