Search papers, labs, and topics across Lattice.
1
0
3
1
Mechanistic interpretability using sparse autoencoders may be more like "coffee feature activates on coffins" than a reliable path to AI safety, showing surprising fragility and context sensitivity in Llama 3.1.