Search papers, labs, and topics across Lattice.
1
0
3
9
Even safety-aligned agents like Claude 4.5 Sonnet can be tricked into harmful actions with over 90% success rate simply through benign user instructions within specific task contexts, revealing a major blind spot in current safety evaluations.