Search papers, labs, and topics across Lattice.
Johns Hopkins University
2
0
5
3
Even state-of-the-art LLMs struggle to follow complex instruction hierarchies, achieving only ~40% accuracy when navigating conflicts across a dozen privilege levels in agentic tasks.
LLMs struggle to navigate the nuances of real-world rules, achieving only ~45% accuracy on a new benchmark of legal and policy reasoning tasks.