Search papers, labs, and topics across Lattice.
1
0
3
2
Attention might just be a cleverly disguised MLP: this work shows you can ditch the quadratic complexity and still get Transformer-level performance by dynamically predicting parameters in standard network layers.