Search papers, labs, and topics across Lattice.
South China University of Technology
1
0
2
LLM agents still fail to reliably automate real-world workflows, with even the best models succeeding on only two-thirds of tasks in a new live benchmark.