Search papers, labs, and topics across Lattice.
University of Oxford
3
0
6
SeClaw reveals that existing benchmarks fall short in capturing the complexities of agent behavior, enabling a more nuanced evaluation of security risks in autonomous systems.
Adversarially finetuning CLIP using a pretraining-inspired recipe with web data and feature regularization yields significantly better zero-shot robustness across diverse datasets than standard adversarial training.
Coding agents are vulnerable to a new class of stealthy, automated prompt injection attacks via poisoned skills, achieving high success rates even in realistic software engineering tasks.