Search papers, labs, and topics across Lattice.
Harvard University
2
0
5
LLMs' harmful outputs stem from a surprisingly compact and unified set of weights, suggesting a fundamental, addressable structure underlying even emergent misalignment.
LLMs get *more* honest when they have time to reason, defying human tendencies and revealing surprising insights about their internal representational geometry.