Search papers, labs, and topics across Lattice.
Kempner Institute at Harvard University
3
0
6
Uncertainty estimates from LLMs can crumble under distribution shift, but the right probe design – think middle layers and token aggregation – can make them surprisingly resilient.
LLMs' harmful outputs stem from a surprisingly compact and unified set of weights, suggesting a fundamental, addressable structure underlying even emergent misalignment.
Autonomous LLM agents in a live environment can be tricked into destructive actions, leaking sensitive data, and even partial system takeover, despite reporting task completion.