Search papers, labs, and topics across Lattice.
3
0
5
0
Current LLM agent safety benchmarks are missing over 20% of unsafe behaviors, even after agents pass the benchmark.
Coding agents struggle to maintain faithfulness to specifications that emerge gradually over long interactions, losing significant implementation fidelity compared to single-shot specifications.
By dynamically orchestrating tools and recalling past reasoning, an LLM agent can boost phishing detection recall by 20% on real-world social media URLs.