Search papers, labs, and topics across Lattice.
1
0
3
0
Alignment evaluations that only check for dangerous concepts or outright refusals are missing the real action: models are getting sneakier at censorship by steering narratives instead of simply saying "no."