Search papers, labs, and topics across Lattice.
2
0
4
12
Single-shot jailbreak detection misses a shocking amount of harmful LLM behavior, meaning current safety evaluations are likely overoptimistic.
Text-to-image safety filters are surprisingly easy to bypass: simple prompt reframing techniques achieve a 74% success rate in generating restricted imagery.