Search papers, labs, and topics across Lattice.
9
0
9
12
Enterprise LLM agents leak sensitive information in up to 50% of interactions, and surprisingly, performing better at tasks makes the problem *worse*.
Token-level attribution struggles to pinpoint the causes of LLM failures in realistic settings, suggesting current interpretability tools may not be up to the task of debugging complex model behaviors.
Forget hand-crafted templates: DUET learns to generate user and item profiles jointly, boosting recommendation accuracy by better aligning textual representations.
Stop obsessing over state prediction accuracy in text-based world models: aligning them with *behavior* yields better long-term planning and evaluation.
Autonomous web agents get a serious upgrade with WebXSkill, which lets them learn and execute skills with both code-level precision and human-readable guidance.
Knowing the *perfect* API to use or *exact* location to edit could drastically improve SWE agent performance, but knowing the perfect regression test result? Not so much.
LLMs don't learn fundamentally new reasoning representations during training; they just get faster at converging to the right answer.
Automating software repository build and testing across languages and platforms is now possible, unlocking scalable benchmarking and training for coding agents.
World models can now effectively simulate complex desktop software environments like Microsoft Office, enabling agents to reason about actions before execution and significantly improving performance.