Search papers, labs, and topics across Lattice.
1
0
3
Probabilistic Transformers can now scale to 0.4B parameters and beat standard Transformers of the same size, thanks to a hyperparameter transfer trick.