Search papers, labs, and topics across Lattice.
2
0
3
Recurrent models can now achieve Transformer-competitive performance on recall-intensive tasks, thanks to a simple memory caching mechanism that grows memory capacity with sequence length.
Surprisingly, using only a single inner loop update in data mixing can lead to failure, and the optimal number of inner loop steps scales logarithmically with the parameter update budget.