Search papers, labs, and topics across Lattice.
1
0
3
2
Jailbreaking LLMs isn't a monolith: seemingly equivalent levels of harmful compliance can mask drastically different internal mechanisms and vulnerabilities, with RLVR surprisingly preserving much of the original model's safety awareness.