Search papers, labs, and topics across Lattice.
5
0
8
1
Safety benchmarks for agent systems can be rapidly adapted to new execution environments by customizing a three-dimensional safety taxonomy, enabling continuous safety evaluation as agent capabilities evolve.
Reasoning SFT doesn't just memorize, it generalizes—but only if you train it long enough, feed it good data, and use a capable model, and even then, reasoning gains come at the cost of safety.
Current LLM safety evaluations miss the mark: ATBench reveals how risks in realistic, multi-step agent interactions emerge over time, challenging even the strongest models.
Tool-using agents like Clawdbot are surprisingly vulnerable to seemingly harmless prompts, where minor misinterpretations can quickly escalate into high-stakes tool actions.
DeepSight offers an all-in-one open-source toolkit for LLM safety, promising to move beyond black-box evaluations and provide white-box insights into internal mechanisms.