Search papers, labs, and topics across Lattice.
1
0
3
Fine-tuning can unexpectedly break safety guardrails because alignment concentrates in brittle, low-dimensional subspaces, causing gradient descent to steer models into alignment-sensitive regions despite initial orthogonality.