Search papers, labs, and topics across Lattice.
2
0
3
1
LLM agents automating productivity tasks achieve only moderate success (39-64%) while exhibiting surprisingly high rates of unsafe actions (7-33%) in realistic, multi-service workflows.
LLMs can't reliably generate the very skills that boost their performance, and smaller models equipped with expert-crafted skills can rival larger, skill-less models.