Search papers, labs, and topics across Lattice.
1
0
3
Current language agents are still far from matching human expert performance when faced with real-world professional tasks requiring complex reasoning, authoritative source retrieval, and domain-specific knowledge, as revealed by the new \$OneMillion-Bench benchmark.