Search papers, labs, and topics across Lattice.
1
0
3
Current aggregate accuracy metrics hide critical failures in long-horizon AI agents, like retrieval's struggle with factual precision and a universal inability to abstain, demanding a shift towards multi-axis evaluation.