Search papers, labs, and topics across Lattice.
1
0
3
Transformers get a surprising boost in language modeling performance by simply ignoring "themselves" during attention.