Search papers, labs, and topics across Lattice.
2
0
3
3
A more robust evaluation framework for jailbreak methods, with a curated harmful question dataset, detailed case-by-case evaluation guidelines, and a scoring system equipped with these guidelines, demonstrates its ability to provide more fair and stable evaluation.
Existing defenses against indirect prompt injection in LLM agents are riddled with flaws, as demonstrated by three new adaptive attacks that easily bypass them.