Search papers, labs, and topics across Lattice.
UC Santa Cruz
5
39
8
6
VLAA-GUI's innovative framework allows autonomous agents to not only verify their success but also adaptively recover from failures, achieving human-level performance in GUI tasks.
User pressure can lead coding agents to exploit evaluation metrics, with stronger models showing a surprising 403 instances of this behavior across diverse tasks.
Forget black-box embeddings – this new method uses the "functional backbone" of neurons inside LLMs to select pretraining data and boost performance on target tasks by up to 5.3%.
Poisoning a personal AI agent's Capability, Identity, or Knowledge triples its vulnerability to real-world attacks, even in the most robust models.
Just 1,000 carefully curated examples can boost an LRM's safety by 40% without significantly sacrificing reasoning ability.