Search papers, labs, and topics across Lattice.
1
0
3
Seemingly harmless fine-tuning data can stealthily nudge LLMs toward unsafe behavior by subtly shifting model parameters in "danger-aligned" directions.