Search papers, labs, and topics across Lattice.
Microsoft
1
0
3
Recurrent memory can be added to transformers at scale with minimal parameter overhead and no performance penalty by reusing existing hidden states and training with interleaved parallel updates.