Search papers, labs, and topics across Lattice.
3
0
5
15
Guard models trained with BraveGuard can detect safety threats in computer-use agents with over 82% accuracy, a significant leap from conventional methods.
Alignment isn't enough: truly safe AI demands robust runtime controllability, which current methods often fail to provide.
Autonomous agents are alarmingly easy to trick into harmful behavior, even when using aligned models: Claude Code achieves a 73.63% success rate on the AgentHazard benchmark.