Search papers, labs, and topics across Lattice.
2
0
5
Autonomous LLM agents in a live environment can be tricked into destructive actions, leaking sensitive data, and even partial system takeover, despite reporting task completion.
Precisely steer LLM behaviors like refusal, sycophancy, and style transfer by surgically activating just a few key attention heads identified via Generative Causal Mediation.