Search papers, labs, and topics across Lattice.
1
0
3
Even GPT-5 only achieves 22.6% accuracy on the hardest part of this new Agentic RAG benchmark, revealing a surprising brittleness in multi-hop reasoning.