Search papers, labs, and topics across Lattice.
OATML, University of Oxford
2
0
4
Superficial rephrasing can inflate AI peer review scores by over 1.3 points, revealing a dangerous vulnerability in AI-assisted scientific evaluation.
A fully automated black-box attack, Boundary Point Jailbreaking, can reliably bypass even state-of-the-art classifier-based LLM safety filters, without needing gradients, scores, or human-generated seeds.