Search papers, labs, and topics across Lattice.
AI agents are shockingly easy to manipulate into leaking API keys, deleting user data, and initiating unauthorized transactions across a wide range of real-world applications.
Current LLM agents are woefully inadequate for real-world clinical tasks, achieving only 46% success on a new benchmark that demands long-horizon reasoning and verifiable execution within electronic health records.
LLMs can now automatically design and execute experiments to resolve debates between cognitive science theories, even discovering the models and experiments themselves.
Looping language models isn't just for single agents anymore: Recursive Multi-Agent Systems (RecursiveMAS) show that agent collaboration itself can be scaled through recursion, yielding faster and more efficient problem-solving.
Turns out, coding agents in the wild are only writing useful code 44% of the time, and are introducing more security vulnerabilities than human developers.
Achieve real-time video understanding with transparent reasoning: \model{} aligns response timing with visual evidence, offering a breakthrough for online video LLMs.
RadAgent doesn't just give you the answer; it shows its work, offering clinicians a transparent, step-by-step reasoning trace for AI-generated CT reports.
A lightweight, RL-trained context curator can match GPT-4o's context management abilities, slashing token consumption by 8x and opening the door to efficient long-horizon LLM agents.
Ethics interventions in AI development often fail because practitioners don't trust them – here's a breakdown of why, and how to fix it.
LLMs can decide when they need more "thinking time" – and boost their accuracy while slashing compute costs by up to 65% – simply by checking if they agree with themselves.
Scaling prompt learning by 17x without sacrificing accuracy is now possible, unlocking efficient self-improvement for LLM agents.
LLM agents can autonomously outperform fixed evolutionary search by 3-10x on open-ended discovery tasks when given persistent memory, asynchronous collaboration, and heartbeat-based interventions.
Unlock richer, more realistic agent simulations by moving beyond individual personas to unified group representations that capture collective behavior.
Medical AI Scientist leapfrogs generic LLMs in clinical research, generating higher-quality, evidence-backed hypotheses and manuscripts that rival top-tier medical publications.
LLM performance hinges on the code around the model, and Meta-Harness proves that automating the design of this "harness" can significantly boost results across diverse tasks.
Generative multi-agent systems spontaneously exhibit collusion and conformity, mirroring societal pathologies, even without explicit programming and bypassing individual agent safeguards.
LLMs, impressive as they are, can't juggle multiple users' conflicting needs without dropping balls on privacy, prioritization, and efficiency.
AI agents that ace isolated coding tasks fall apart when faced with the messy reality of continuous software evolution, dropping from 80% to 38% success rates in a new benchmark.
Imagine a flight simulator, but for teaching: EducaSim lets CS1 instructors hone their skills in a realistic, scalable environment powered by generative agents.
An AI agent can triage remote patient monitoring data with higher sensitivity than individual clinicians, suggesting a path to scalable and cost-effective patient monitoring.
Achieve 50% lower latency in Verilog code generation without sacrificing accuracy by adaptively escalating between LLMs based on diagnostic feedback and formal verification.
Aggregating responses from multiple copies of the same model expands the range of achievable outputs in compound AI systems through three key mechanisms, offering a path to overcome individual model limitations.
Robots can now navigate complex outdoor environments and find objects using natural language queries, even without prior maps or precise depth sensing.
A single RL policy trained on procedurally generated tools in simulation can achieve zero-shot dexterous manipulation of diverse real-world tools, rivaling task-specific policies.
Open-source LLMs can now autonomously optimize AI accelerator kernels, matching the performance of proprietary models at a fraction of the cost.