Search papers, labs, and topics across Lattice.
1
0
2
Today's best LLMs fail spectacularly at long-horizon reasoning, achieving under 10% accuracy on a new benchmark designed to isolate this critical capability.