Search papers, labs, and topics across Lattice.
1
0
2
Models can achieve similar accuracy while exhibiting starkly different reasoning failures, revealing a hidden complexity in AI performance that aggregate metrics overlook.