Search papers, labs, and topics across Lattice.
2
0
4
1
Safety benchmarks for agent systems can be rapidly adapted to new execution environments by customizing a three-dimensional safety taxonomy, enabling continuous safety evaluation as agent capabilities evolve.
Current LLM safety evaluations miss the mark: ATBench reveals how risks in realistic, multi-step agent interactions emerge over time, challenging even the strongest models.