Search papers, labs, and topics across Lattice.
2
0
4
0
Current LLM agent safety benchmarks are missing over 20% of unsafe behaviors, even after agents pass the benchmark.
Coding agents struggle to maintain faithfulness to specifications that emerge gradually over long interactions, losing significant implementation fidelity compared to single-shot specifications.