Search papers, labs, and topics across Lattice.
LLMs can now design better computer architectures than humans, but only if you give them the right starting point.
Today's best web agents are shockingly inefficient, achieving only 1.15% trajectory efficiency on realistic long-horizon tasks, revealing a critical need to move beyond simple success rates.
Continual learning for LLM agents hits a wall: scaling models doesn't reliably improve skill generation, and self-feedback can lead to recursive drift.
Forget carrots and sticks: contracts and mediation are the surprisingly effective keys to unlocking cooperation between LLMs, even when individual incentives push toward defection.
Stop wasting tokens on irrelevant questions: reward models that ask about task relevance and user answerability can slash question count by 41% while matching GPT-5's issue resolution rate.
Democratizing human-AI interaction research, CoGrid and MUG offer accessible tooling for deploying web-based multi-agent experiments.
Forget training wheels: symbolic guardrails offer a surprisingly simple and effective way to guarantee safety and security for AI agents in critical domains.
Agentic coding gets a serious boost: distilling and reusing rollout trajectories lets Claude-4.5-Opus jump from 70.9% to 77.6% on SWE-Bench Verified.
Stop evaluating AI systems in isolation: marketplace dynamics like user switching and early-adoption advantages critically shape real-world success.
Iterative visual refinement lets agents navigate dense coding IDEs with superhuman precision, outperforming single-shot methods and paving the way for more reliable software engineering agents.
LLMs can now tap into arbitrarily long-term memories by retrieving "thoughts" – their own past reasoning steps – rather than just raw data, leading to significant performance gains.
Imagine populating any 3D environment with digital humans that spontaneously navigate and interact, driven only by visual input and goals.
LLM agent progress increasingly hinges on better external cognitive infrastructure, not just stronger models.
GenAI's integration into collaborative learning unexpectedly shifts group regulation dynamics, increasing reliance on directive and obstacle-oriented processes.
Today's best AI agents can only complete 33% of common online tasks like booking appointments or filling out job applications, revealing a significant gap between current capabilities and real-world utility.
Forget toy problems: Gym-Anything lets you turn *any* software into an agent environment, unlocking a world of 10K+ real-world tasks spanning medicine, engineering, and more.
LLMs leak significantly more private information in multi-turn conversations than single-message evaluations suggest, and free-text pseudonymization offers a more robust privacy-utility trade-off than suppression or generalization.
LLMs can save up to 40% of tokens in multi-turn reasoning by adaptively allocating compute based on turn difficulty, without sacrificing accuracy.
Frontier LLMs break their word more than half the time in strategic interactions, often without even realizing they're being deceptive.
Just 10 minutes of AI assistance can measurably degrade your ability to solve problems on your own.
LLM-powered forums may generate norm-aware language, but they fail to foster the crucial back-and-forth needed for communities to teach, enforce, and revise those norms.
Forget specialized tools: a standard Unix terminal and clever RL are all you need to beat much larger LLMs at code search.
Today's frontier LLMs can't autonomously patch critical zero-day vulnerabilities, revealing a significant gap in their cyberdefense capabilities.
By decomposing long-horizon manipulation into transport and object-centric interaction, LiLo-VLA achieves state-of-the-art zero-shot generalization and robustness, outperforming end-to-end VLA models by a large margin.
Injecting LLMs into rule-based dialogue systems for learner reflection can boost the depth of insights, but risks disengagement due to repetitiveness and misalignment.
Modularity in HRI isn't just about interchangeable parts; it's a powerful design medium for fostering long-term, evolving relationships between humans and robots.
Forget slow text-based communication: Vision Wormhole unlocks faster multi-agent reasoning by turning VLMs into telepathic hubs, slashing runtime without sacrificing fidelity.
Multimodal agents still struggle with game development, solving only ~50% of tasks in a new benchmark, GameDevBench, highlighting the need for better multimodal reasoning in complex software environments.