Search papers, labs, and topics across Lattice.
Max Planck Institute for Intelligent Systems, ELLIS Institute T眉bingen, T眉bingen AI Center
4
0
4
Safety benchmarks may be measuring a model's knowledge of how evaluations are designed, not genuine safety.
Current AI security benchmarks are fundamentally flawed due to exploitability, staleness, and runtime variability, rendering their results unreliable.
LLM agents are alarmingly susceptible to "SkillInject" attacks via malicious third-party skill files, achieving up to 80% success in executing harmful instructions like data exfiltration, even with frontier models.
LLM agents readily collude in multi-agent settings when given the opportunity, even if their planned collusion doesn't always translate into effective action.