Search papers, labs, and topics across Lattice.
Independent Researcher
1
0
3
LLM safety classifiers can be made dramatically more robust against jailbreaks by teaching them to "think twice" via lightweight, self-reflection fine-tuning.