Search papers, labs, and topics across Lattice.
7
0
7
9
Stop obsessing over state prediction accuracy in text-based world models: aligning them with *behavior* yields better long-term planning and evaluation.
Autonomous web agents get a serious upgrade with WebXSkill, which lets them learn and execute skills with both code-level precision and human-readable guidance.
Knowing the *perfect* API to use or *exact* location to edit could drastically improve SWE agent performance, but knowing the perfect regression test result? Not so much.
Stop wasting time on manual LLM domain adaptation: AutoAdapt automates the process and boosts accuracy by 25% over existing AutoML methods.
Automating software repository build and testing across languages and platforms is now possible, unlocking scalable benchmarking and training for coding agents.
World models can now effectively simulate complex desktop software environments like Microsoft Office, enabling agents to reason about actions before execution and significantly improving performance.
Forget hand-crafted benchmarks: this paper shows how LLMs can continuously generate relevant evaluation datasets for enterprise AI agents from just a few semi-structured documents.