Search papers, labs, and topics across Lattice.
2
0
3
0
LLM agents still fail to reliably automate real-world workflows, with even the best models succeeding on only two-thirds of tasks in a new live benchmark.
Current phone-use agents are often *too* helpful, routinely violating user privacy by filling in unnecessary personal information even when a task doesn't require it.