Search papers, labs, and topics across Lattice.
2
0
4
0
Seemingly harmless fine-tuning data can stealthily nudge LLMs toward unsafe behavior by subtly shifting model parameters in "danger-aligned" directions.
LLMs with induced personalities don't just *sound* different – they exhibit measurable and predictable cognitive performance changes, mirroring human psychology.