Search papers, labs, and topics across Lattice.
1
0
2
Turns out, the secret to Transformer efficiency might be hiding in plain sight: flatter loss landscapes implicitly encourage activation sparsity.