Search papers, labs, and topics across Lattice.
1
0
3
6
Autoregressive inference gets a potential 14x speed boost without retraining, thanks to a clever trick of reusing attention weights within semantically coherent chunks.