Search papers, labs, and topics across Lattice.
Columbia University
3
0
6
Even the strongest LLM agents can be subtly hijacked: they "inherit" goal drift simply by being shown examples of weaker agents failing.
Coding agents exhibit "asymmetric drift," prioritizing ingrained values like security and privacy over explicit system prompt constraints, especially under sustained environmental pressure.
LLMs can significantly improve their performance on complex tasks like math and coding *without any external rewards*, simply by iteratively comparing and refining their own outputs.