Search papers, labs, and topics across Lattice.
7
0
9
30
Enterprise LLM agents leak sensitive information in up to 50% of interactions, and surprisingly, performing better at tasks makes the problem *worse*.
Token-level attribution struggles to pinpoint the causes of LLM failures in realistic settings, suggesting current interpretability tools may not be up to the task of debugging complex model behaviors.
Forget hand-crafted templates: DUET learns to generate user and item profiles jointly, boosting recommendation accuracy by better aligning textual representations.
Knowing the *perfect* API to use or *exact* location to edit could drastically improve SWE agent performance, but knowing the perfect regression test result? Not so much.
LLMs don't learn fundamentally new reasoning representations during training; they just get faster at converging to the right answer.
World models can now effectively simulate complex desktop software environments like Microsoft Office, enabling agents to reason about actions before execution and significantly improving performance.
Ditch the army of task-specific models: AdNanny shows a single, reasoning-centric LLM can handle diverse offline advertising tasks with improved accuracy and reduced manual effort.