Search papers, labs, and topics across Lattice.
1
0
3
Attention sinks aren't just a forward-pass phenomenon; they actively warp the training landscape by creating "gradient sinks" that drive massive activations.