Search papers, labs, and topics across Lattice.
Even GPT-5 and Gemini 2.5 Pro still fail to efficiently couple reasoning with tool use, requiring up to 2.7x more tool calls than theoretically optimal in a new diagnostic environment.
LLMs in embodied environments get a massive boost from structured rules, with rule retrieval alone contributing +14.9 pp to single-trial success.
Forget specialized tools: a standard Unix terminal and clever RL are all you need to beat much larger LLMs at code search.
LLMs can navigate complex 3D environments more effectively and with far fewer tokens by using a hierarchical scene graph representation derived from omnidirectional sensor data.
Learn a critic for coding agents from human-in-the-loop interaction traces alone, sidestepping the need for dense, verifiable rewards.
HALyPO stabilizes human-robot collaboration by directly certifying the convergence of decentralized policy learning in parameter space, sidestepping the oscillations that plague standard MARL approaches.
Forget hand-engineering world models – this work proves that competent agents *must* internally represent the world in a structured, predictive way to minimize regret under uncertainty.
AI tools are surprisingly bad at classifying the cognitive demand of math problems, with accuracy barely above chance and a systematic bias towards average difficulty, raising concerns about their utility in supporting teachers.
Today's frontier LLMs can't autonomously patch critical zero-day vulnerabilities, revealing a significant gap in their cyberdefense capabilities.
Injecting knowledge graphs into LLMs boosts medical question generation by 8%, suggesting a simple way to patch up LLM knowledge gaps.
By decomposing long-horizon manipulation into transport and object-centric interaction, LiLo-VLA achieves state-of-the-art zero-shot generalization and robustness, outperforming end-to-end VLA models by a large margin.
Injecting LLMs into rule-based dialogue systems for learner reflection can boost the depth of insights, but risks disengagement due to repetitiveness and misalignment.
Modularity in HRI isn't just about interchangeable parts; it's a powerful design medium for fostering long-term, evolving relationships between humans and robots.
General-purpose LLM agents stumble badly when faced with the messy reality of diverse, multi-domain tasks, and simply scaling interactions or parallel sampling doesn't fix it.
LLMs can turn sparse rewards into dense training signals for RL agents, achieving comparable performance with significantly higher sample efficiency.
Stop guessing when humans want to take over: modeling user intervention styles in web agents boosts their usefulness by 26.5%.
Forget training on narrow GitHub issues – Hybrid-Gym unlocks surprisingly broad coding skills by teaching agents to explore codebases and design architectures in synthetic environments.
An educational RAG system achieves 84% accuracy in answering student questions with minimal human editing, suggesting a practical path towards scalable AI-assisted teaching.
Forget slow text-based communication: Vision Wormhole unlocks faster multi-agent reasoning by turning VLMs into telepathic hubs, slashing runtime without sacrificing fidelity.
Multimodal agents still struggle with game development, solving only ~50% of tasks in a new benchmark, GameDevBench, highlighting the need for better multimodal reasoning in complex software environments.
Forget context window limits: this RL method uses LLM-generated summaries to train agents for long-horizon tasks, achieving higher success rates with less context.
LLMs struggle to track state across multiple tool-use steps, but a surprisingly simple fix—restating prior variable values—yields substantial performance gains.