Search papers, labs, and topics across Lattice.
The Chinese University of Hong Kong
1
0
2
12
LLM agents still fail to reliably automate real-world workflows, with even the best models succeeding on only two-thirds of tasks in a new live benchmark.