Search papers, labs, and topics across Lattice.
2
0
4
14
Uncover more LLM agent failures, faster: DIVERT's diversity-guided user simulation finds more bugs per token than standard rollout methods.
Current LLM agent evaluations miss 8-17% of policy violations because they only check the final outcome, not whether the agent actually followed the rules to get there.