Search papers, labs, and topics across Lattice.
1
0
2
LLMs may already possess surprisingly strong self-awareness of concept manipulation, detectable via mechanistic interpretability techniques, even when they deny it in their outputs.