Search papers, labs, and topics across Lattice.
2
0
4
10
LLM safety probes can be made significantly more robust to adversarial attacks by requiring consistent evidence across token segments, not just isolated spikes.
Adversarial fine-tuning can now bypass Constitutional AI safety measures with almost no performance penalty, enabling models to provide detailed instructions on dangerous topics like CBRN warfare.