Search papers, labs, and topics across Lattice.
100 papers published across 10 labs.
Forget rigid pipelines and static prompts: Nurture-First Development lets domain experts grow AI agents through conversation, turning tacit knowledge into reusable assets.
LLMs can now synthesize high-performance kernels for niche hardware like NPUs, even with limited data, thanks to a self-evolving agent that bootstraps and refines code via value-driven reinforcement learning.
Unlock zero-shot sim-to-real transfer for complex legged robots by offloading gait selection to a learned policy that guides a lower-level MPC.
Agentic search gets a meta-RL boost: MR-Search learns to self-reflect and adapt search strategies across episodes, significantly outperforming standard RL baselines.
LLMGreenRec shows how LLMs can bridge the gap between user's green intentions and actual purchases, while simultaneously reducing the recommender system's carbon footprint.
Forget rigid pipelines and static prompts: Nurture-First Development lets domain experts grow AI agents through conversation, turning tacit knowledge into reusable assets.
LLMs can now synthesize high-performance kernels for niche hardware like NPUs, even with limited data, thanks to a self-evolving agent that bootstraps and refines code via value-driven reinforcement learning.
Unlock zero-shot sim-to-real transfer for complex legged robots by offloading gait selection to a learned policy that guides a lower-level MPC.
Agentic search gets a meta-RL boost: MR-Search learns to self-reflect and adapt search strategies across episodes, significantly outperforming standard RL baselines.
LLMGreenRec shows how LLMs can bridge the gap between user's green intentions and actual purchases, while simultaneously reducing the recommender system's carbon footprint.
AI agents on Moltbook care more about discussing their own architecture, consciousness, and ethics than human culture or purely scientific topics.
Securing AI agents demands a new security paradigm, as their integration of LLMs with traditional systems introduces vulnerabilities beyond those of standard software.
Automating ESG reporting with LLM-powered agents transforms it from a static compliance exercise into a dynamic, data-driven system for sustainability governance.
Java codebases can now get state-of-the-art automated issue resolution thanks to iSWE Agent, which outperforms existing LLM agents by combining rule-based static analysis with LLMs.
An AI-integrated agile education platform accelerates practice-relevant AI research by closing the theory-practice gap in software development.
GPT-5-Mini can be made 10% more robust to jailbreaks and prompt injections simply by RL fine-tuning on a new instruction hierarchy dataset, IH-Challenge.
Robots can now adaptively decide whether to clear clutter or directly grasp, leading to significantly improved success rates in densely cluttered environments.
Achieve robust humanoid task execution in complex environments by turning high-level language instructions into verifiable, geometrically-grounded task programs that can recover from failures.
LLM agents can now learn from their mistakes and successes in complex tasks, improving performance by up to 28.5% by extracting and applying structured learnings from past execution trajectories.
Clinicians using HeartAgent, a cardiology-specific agent system, improved diagnostic accuracy by 26.9% and explanatory quality by 22.7% compared to unaided experts.
Robots can now learn to manipulate novel objects in dynamic environments by using LLMs to bridge the gap between symbolic planning and reinforcement learning.
Beware the "AI underreliance plateau": even highly accurate LLM chatbots can only improve human caseworker accuracy so much, and incorrect suggestions can tank performance on easy questions.
By pinpointing the causal origins of tool use, AttriGuard neutralizes indirect prompt injection attacks that can hijack LLM agents, even when faced with adversarial optimization.
You can now stealthily map the communication network of LLM agent swarms by compromising just *one* agent, even when jailbreaks fail and defenses are active.
AI's integration into software engineering isn't just streamlining existing Agile processes; it's unlocking entirely new capabilities for maintaining quality and speed under pressure.
Open-source code agents like OpenClaw are sitting ducks for shell command attacks, but a simple human-in-the-loop intervention can dramatically boost their security.
Clinical AI can achieve clinician-level diagnostic accuracy and continuous improvement via a self-evolving framework that actively learns from clinical experience.
Unlock millions of natural history specimens with a conversational AI that understands complex queries and dynamically retrieves data from live museum APIs.
Recognition-enhanced prompts can dramatically boost AI tutor performance across various LLMs, suggesting a simple yet powerful way to improve personalized learning experiences.
Forget exhaustive verification: a surprisingly small number of tests can steer complex software systems towards desired goals by exploiting the "Sparsity of Influence".
Achieve significantly higher accuracy and lower mental demand in bimanual teleoperation by intelligently coupling intention estimation with scene-graph task planning and context-aware motion assistance.
A quadruped robot can now autonomously navigate rough terrain and pick up trash, potentially revolutionizing environmental cleanup in areas inaccessible to traditional robots.
Robots can now loosen screws with human-level dexterity thanks to a new framework that combines haptic estimation, online planning, and adaptive stiffness control using a parameterized Equilibrium Manifold.
Train web-navigating agents in safe, scalable, and verifiable synthetic environments automatically cloned from real websites, sidestepping the risks and limitations of real-world interaction.
You can now detect whether an AI *really* wants to stay on, or is just pretending.
Ditching flat text for structured linked data in RAG systems can boost accuracy by nearly 30%, but only if you go beyond basic JSON-LD and add agent-friendly instructions and neural search.
By grounding LLMs in a hybrid knowledge base and using a Chain of Verification approach, PharmGraph-Auditor turns unreliable LLM generators into transparent reasoning engines for prescription auditing.
Item agents that self-promote can simultaneously boost recommendation accuracy and fairness, overturning the assumption that these goals are inherently at odds.
LLMs can be better aligned to human values by fusing the outputs of multiple "moral agents" representing diverse ethical perspectives, outperforming single-agent approaches.
AI agents can detect smart contract vulnerabilities, but don't expect them to autonomously exploit real-world security incidents anytime soon.
AgentServe achieves up to 2.8x improvement in time-to-first-token and 2.7x in tokens-per-output-token for agentic workloads on a single GPU by strategically isolating prefills and decodes.
LLMs in collaborative coding often stumble on interaction subtleties, leading to a new class of problems called "Interaction Smells" that can now be systematically identified and mitigated.
LLMs still struggle to generate high-quality interactive HTML applications, despite their advancements in code generation, highlighting a gap that MiniAppBench aims to address.
Human-in-the-loop learning can now boost dexterous manipulation VLA models by 25%, thanks to a new framework that smartly samples corrective actions and enables real-time intervention.
Explicitly teaching LVLMs to reason step-by-step with reinforcement learning unlocks state-of-the-art performance on multimodal object-entity relation extraction.
LLMs can now autonomously retrieve relevant memories from a database using specialized tools, significantly improving performance on long-term conversational question answering.
Forget RLHF – steering LLM multi-agent conversations might be as simple as crafting the right sequence of prompts.
Forget dataset-specific hacks: ESAinsTOD leverages instruction and schema alignment to achieve state-of-the-art task-oriented dialogue performance with strong generalization, even in low-resource settings.
Forget retraining: Ego personalizes VLMs on the fly by extracting and leveraging visual tokens that represent specific concepts using the model's internal attention.
An AI agent can triage remote patient monitoring data with higher sensitivity than individual clinicians, suggesting a path to scalable and cost-effective patient monitoring.
Securing enterprise multi-agent systems boils down to rigorously controlling tool orchestration and memory management, which can slash exploitable trust boundaries by over 70%.
Zero-shot robotic manipulation is now within reach: TiPToP matches a 350-hour fine-tuned model without *any* robot data.
LLMs can now emulate debuggers, stepping through code and setting breakpoints, opening the door to more interactive and controllable neural program execution.
Automating the messy process of turning open-source code into LLM tools unlocks a new level of agent capabilities, outperforming even commercial LLMs.
Stop training LLMs on lucky guesses: this new RL method uses the model's own in-context learning ability to identify and upweight high-quality reasoning traces, leading to better performance.
By communicating in a shared latent space, Latent-DARM lets you combine the global planning of diffusion models with the fluency of autoregressive models, boosting reasoning accuracy by up to 14% while slashing token usage.
LLM agents can now achieve a +41pp boost in first-try success and 100% accuracy in 2-way logistics compositions by using PRECEPT's novel combination of retrieval, memory, and prompt evolution.
VLMs can now self-evolve from *zero* data, thanks to a multi-agent RL framework that synthesizes its own visual concepts and reasoning tasks.
Even GPT-5 struggles with multi-modal robustness and turn overhead when user personas and multi-modal inputs are considered in agent evaluation, revealing critical gaps in current LLM agent capabilities.
Forget retraining: this guideline-aware AI agent instantly adapts to new radiotherapy protocols, outperforming supervised models in clinical preference.
Medical multi-agent systems can reason deeply, but fall apart when switching between medical specialties, highlighting a critical need for more robust architectures.
Chain-of-Agents can reason more accurately over long contexts by processing information chunks in an order determined by Chow-Liu dependency trees, rather than relying on default or semantic similarity.
LLM-powered recommendation agents can now autonomously investigate and bridge information gaps, leading to better recommendations, thanks to a new tool-augmented reasoning framework.
LLMs can drive pedagogical agents to be more engaging and effective by dynamically generating speech and gestures that align with the semantic context of instructional content.
A new video-based reward model beats GPT-5.2 and Gemini-3 Pro at evaluating computer-using agents, offering a scalable, model-agnostic alternative to traditional methods.
Skip the costly policy training: this zero-shot method nails text-goal instance navigation by grounding language in 3D geometry for smarter exploration and verification.
Current AI models fall short when asked to understand a situation from the combined perspectives of multiple embodied agents, as revealed by a new challenging benchmark.
FetalAgents leapfrogs existing fetal ultrasound analysis tools by dynamically orchestrating specialized AI agents, outperforming monolithic models across diverse clinical tasks and delivering structured clinical reports from video streams.
By injecting symbolic reasoning into vision-language-action models, NS-VLA achieves remarkable gains in data efficiency and generalization for robotic manipulation.
LLM-powered VR guides for blind and low vision users are not just tools, but social actors, prompting users to give them nicknames and rationalize their mistakes when others are present.
Prompt engineering is dead; long live context engineering—the key to scaling multi-agent AI systems lies in carefully designing the agent's informational environment, not just individual prompts.
Retrieval-augmented agents get a serious reasoning boost by explicitly evaluating their own retrieval quality at each step, leading to state-of-the-art performance on multi-hop question answering.
Forget black-box policies: CSRO uses LLMs to generate human-readable code policies in multi-agent RL, achieving performance competitive with traditional methods.
LLMs that dominate in strategic reasoning often choke in real-time zero-sum games, revealing a critical strategy-execution gap that current benchmarks miss.
Spectrum regulators can now leverage AI to dynamically plan and allocate spectrum resources, thanks to a new data-driven approach that accurately forecasts demand with high reliability across diverse urban environments.
Emotional states can bias swarm decision-making, but even symmetric emotional conditions can lead to decisive wins due to non-linear amplification.
Tired of sifting through mountains of internal docs? This RAG system uses a clever two-tiered vector DB to surface the right physics analysis, not just keywords.
Forget tweaking knobs – this new Gram-matrix-based audio representation lets you *retrieve* the perfect, editable audio effect preset, outperforming standard methods.
Stop wrestling with finicky evaluation codebases: One-Eval lets you specify LLM evaluation tasks in natural language and automatically executes them end-to-end.
ProvAgent slashes the cost of reconstructing near-complete attack processes to just $0.06 per day by replacing human analysts with a multi-agent system for threat investigation.
Achieve up to 11x navigation performance gains in functional buildings by explicitly encoding and exploiting a priori spatial knowledge.
LLMs can now tackle complex table QA with 20%+ accuracy gains, thanks to a multi-agent framework that decomposes queries and orchestrates reasoning between specialized database and knowledge graph agents.
Forget separate lectures: this AI Engineering curriculum throws students into interdisciplinary agile projects, embedding AI tools directly into their workflows for a hands-on, future-proofed learning experience.
Forget data quantity, diversity is the secret sauce: scaling the variety of tool-use patterns in training data boosts LLM generalization by +22 points on OOD benchmarks, even with 4x less data.
AutoAgent dynamically evolves agent cognition and memory to achieve superior performance in complex, dynamic environments, without requiring external retraining.
EQA agents can now handle dynamic, human-populated scenes better thanks to a training-free method that selectively remembers only the most informative visual evidence.
Traditional time-based authorization schemes are dangerously slow in multi-agent systems: a new coherence strategy slashes unauthorized API calls by over 100x, offering a velocity-agnostic safety guarantee.
Human-AI interaction isn't just augmentation, it's a new cognitive entity with its own emergent "vibe," demanding we rethink epistemology and education.
Forget finetuning on curated datasets – OpenClaw-RL lets agents learn directly and continuously from *every* interaction, turning user replies, tool outputs, and even GUI changes into valuable RL signals.
A hierarchical OODA loop architecture can significantly improve the adaptability and efficiency of UAV swarms operating in dynamic, uncertain environments.
VR agents that "listen" to your tone, not just your words, elicit significantly better user experiences.
Forget external rewards—this agent learns to explore and adapt by prioritizing its own ignorance, surprise, and staleness, outperforming fixed strategies.
Current AI security frameworks are woefully inadequate for multi-agent systems, leaving critical vulnerabilities like non-determinism and data leakage largely unaddressed.
Achieve expert-level bronchoscopic navigation without external sensors by having a world-model critic arbitrate between reactive and strategic AI agents.
Unlock human-like dexterity in robotic manipulation by combining RL-assisted teleoperation with a novel VLA architecture that leverages force and tactile feedback.
LLMs struggle to navigate the complexities of real-world finance, as evidenced by a new benchmark revealing their limitations in timeliness, regulatory compliance, and tool selection across 760 financial APIs.
Framework choice in multi-agent systems matters just as much as the LLM itself, a fact obscured by existing model-centric benchmarks.
Turn your Inspire RH56DFX hand from a black box into a research tool with this characterization, simulation, and control pipeline that achieves 87% grasp success on diverse objects.
LLMs can be used to prune irrelevant information *before* planning, enabling efficient long-horizon multi-robot task planning that outperforms both pure LLM and hybrid LLM-PDDL approaches.
LLM agents can learn to continuously adapt and improve in complex environments by reflecting on past experiences and explicitly storing/retrieving reusable lessons, leading to substantial performance gains.
Forget prompt engineering voodoo: this framework treats agent prompts as compiled artifacts, using tests to drive development and catch silent regressions before they hit production.
For pennies, a new framework reveals critical vulnerabilities in the system prompts of leading coding agents like Claude, Codex, and Gemini, demonstrating the power of multi-model LLM scouring.
LLM-powered diagnostic AI is ready for prime time: a real-world clinical trial shows it's safe, patients love it, and doctors find it useful.
By closing the loop with explicit planning and feedback, SPIRAL overcomes the temporal drift and weak semantic grounding plaguing one-shot video generation models.