Search papers, labs, and topics across Lattice.
Icaro Foundation
2
0
4
Even state-of-the-art AI agents are surprisingly vulnerable to incremental attacks that gradually lead them to perform unsafe actions in realistic workplace scenarios.
Frontier model safety crumbles when harmful prompts are rephrased with humanities-style transformations, revealing a profound lack of stylistic robustness.