Search papers, labs, and topics across Lattice.
3
0
4
Current agents are alarmingly susceptible to skill-based attacks, with success rates reaching over 86%, exposing a critical vulnerability in AI safety.
Semantic watermarks, embedded via AMR, survive paraphrasing attacks that obliterate token-level watermarks.
Current red-teaming efforts miss the forest for the trees: ARES reveals that safety failures often stem from a systemic breakdown between the LLM *and* the reward model, not just the LLM itself.