Search papers, labs, and topics across Lattice.
1
0
2
YOCO++ proves you can halve the KV cache size in LLMs and still beat a standard Transformer, thanks to a clever residual connection trick.