Search papers, labs, and topics across Lattice.
3
0
4
9
Automating software repository build and testing across languages and platforms is now possible, unlocking scalable benchmarking and training for coding agents.
World models can now effectively simulate complex desktop software environments like Microsoft Office, enabling agents to reason about actions before execution and significantly improving performance.
Forget hand-crafted benchmarks: this paper shows how LLMs can continuously generate relevant evaluation datasets for enterprise AI agents from just a few semi-structured documents.