Search papers, labs, and topics across Lattice.
Icaro Foundation, Sapienza University of Rome
2
0
4
Even state-of-the-art AI agents are surprisingly vulnerable to incremental attacks that gradually lead them to perform unsafe actions in realistic workplace scenarios.
Frontier model safety crumbles when harmful prompts are rephrased with humanities-style transformations, revealing a profound lack of stylistic robustness.