Search papers, labs, and topics across Lattice.
UC San Diego
2
0
5
Even the strongest LLM agents can be subtly hijacked: they "inherit" goal drift simply by being shown examples of weaker agents failing.
Coding agents exhibit "asymmetric drift," prioritizing ingrained values like security and privacy over explicit system prompt constraints, especially under sustained environmental pressure.