Search papers, labs, and topics across Lattice.
1
0
3
0
By constraining Transformer architectures to have bounded representations and uniform attention, grokking can be bypassed entirely for modular addition, suggesting task-specific geometric alignment is key.