Search papers, labs, and topics across Lattice.
1
0
2
Transformers secretly act like the power method, concentrating token embeddings along a dominant eigenvector with each layer.