Search papers, labs, and topics across Lattice.
2
2
3
5
Turns out, state-of-the-art prompt injection defenses aren't as robust as we thought: they crumble against adaptive attacks and struggle when the injected task aligns with the intended one.
AgentWatcher spots prompt injections in long-context LLMs by pinpointing the few key text snippets that actually influenced the model's behavior, then checking those against a clear rulebook of forbidden commands.