Search papers, labs, and topics across Lattice.
Unlock richer, more realistic agent simulations by moving beyond individual personas to unified group representations that capture collective behavior.
Stop hand-coding your LLM harnesses: Meta-Harness can automatically discover harnesses that outperform state-of-the-art systems while using fewer context tokens and generalizing across models.
Medical AI Scientist leapfrogs generic LLMs in clinical research, generating higher-quality, evidence-backed hypotheses and manuscripts that rival top-tier medical publications.
Generative multi-agent systems spontaneously exhibit collusion and conformity, mirroring societal pathologies, even without explicit programming and bypassing individual agent safeguards.
Robots often ignore your commands mid-task, but ReSteer offers a way to fix this by pinpointing and patching the "blind spots" in their training data.
Automating surgical patient triage with an LLM achieves 94% sensitivity, but discrepancies reveal more about clinical workflow gaps than AI errors.
LLM agents struggle to maintain coherent decision-making in realistic retail environments over long horizons, even with a novel framework for adaptive strategy evolution.
AI agents that ace isolated coding tasks fall apart when faced with the messy reality of continuous software evolution, dropping from 80% to 38% success rates in a new benchmark.
Imagine a flight simulator, but for teaching: EducaSim lets CS1 instructors hone their skills in a realistic, scalable environment powered by generative agents.
Semi-decentralized POMDPs offer a unifying framework that subsumes decentralized and multiagent POMDPs, enabling a more nuanced approach to communication constraints in multi-agent systems.
An AI agent can triage remote patient monitoring data with higher sensitivity than individual clinicians, suggesting a path to scalable and cost-effective patient monitoring.
Achieve 50% lower latency in Verilog code generation without sacrificing accuracy by adaptively escalating between LLMs based on diagnostic feedback and formal verification.
Aggregating responses from multiple copies of the same model expands the range of achievable outputs in compound AI systems through three key mechanisms, offering a path to overcome individual model limitations.
Robots can now learn from their mistakes in real-time via a novel reflective planning framework, leading to significant performance gains in long-horizon tasks.
Airavat automates expert-level Internet measurement, catching methodological flaws that traditional testing misses.
Diffusion models can efficiently sample lookahead action sequences for active search, outperforming traditional tree search while mitigating optimism bias.
Robots can now navigate complex outdoor environments and find objects using natural language queries, even without prior maps or precise depth sensing.
LLMs can turn sparse rewards into dense training signals for RL agents, achieving comparable performance with significantly higher sample efficiency.
A single RL policy trained on procedurally generated tools in simulation can achieve zero-shot dexterous manipulation of diverse real-world tools, rivaling task-specific policies.