Search papers, labs, and topics across Lattice.
1
0
3
13
Current LLM agent evaluations miss 8-17% of policy violations because they only check the final outcome, not whether the agent actually followed the rules to get there.