Search papers, labs, and topics across Lattice.
D2 any-refusal is 1.000 early, then falls to 0.664 and 0.228. Geometrically, R
1
0
3
Safety training doesn't just make models refuse more, it fundamentally *reorganizes* where and how those refusals happen inside the network.