Search papers, labs, and topics across Lattice.
University of Pennsylvania
4
0
8
Meerkat finds nearly 4x more examples of reward hacking on CyBench than previous audits by combining clustering with agentic search to uncover violations across many agent traces.
Robots can now adapt their safety behavior on the fly in response to changing real-world contexts, without needing pre-programmed rules or maps.
Stop blindly trusting self-consistency: this work reveals how to optimally combine cheap "weak" checks with expensive "strong" verification to improve LLM reasoning.
User-defined rules for "counterfactual harm" and "complementarity" let you steer human-AI collaboration toward better decisions without modeling human behavior.