Search papers, labs, and topics across Lattice.
98 papers published across 10 labs.
Training domain-specific coding LLMs with realistic environments and large-scale RL can yield substantial gains in practical software engineering tasks.
Giving medical imaging AIs the same tools as human doctors actually *hurts* their performance, revealing a surprising lack of spatial reasoning.
Autonomous coding agents can now outperform expert-engineered attention kernels on NVIDIA's latest Blackwell GPUs, discovering optimizations that eluded human experts.
Stop relying on brittle classifiers: SEAR uses LLM reasoning and a unified SQL query layer to evaluate, route, and explain decisions in LLM gateways.
LLM agents can achieve near-perfect memory recall without prohibitive costs by strategically combining fast, lossy retrieval with slower, exhaustive deliberation.
Training domain-specific coding LLMs with realistic environments and large-scale RL can yield substantial gains in practical software engineering tasks.
Giving medical imaging AIs the same tools as human doctors actually *hurts* their performance, revealing a surprising lack of spatial reasoning.
Autonomous coding agents can now outperform expert-engineered attention kernels on NVIDIA's latest Blackwell GPUs, discovering optimizations that eluded human experts.
Stop relying on brittle classifiers: SEAR uses LLM reasoning and a unified SQL query layer to evaluate, route, and explain decisions in LLM gateways.
LLM agents can achieve near-perfect memory recall without prohibitive costs by strategically combining fast, lossy retrieval with slower, exhaustive deliberation.
GUI agents struggle with long tasks not because they mis-click, but because they forget what they were doing, and a new "anchored memory" method can fix it.
Closed-loop feedback using VLMs can dramatically improve text-to-image generation quality, even without additional training.
Despite advances in LLMs, human-AI collaboration still significantly outperforms AI-only agents in domain-specific data science tasks, proving that human expertise remains crucial.
Injecting demonstrations with a carefully annealed probability can drastically improve exploration in RLVR, even for tasks requiring novel reasoning or domain-specific knowledge.
Coordinating multi-robot teams to complete manipulation tasks just got easier: GoC-MPC handles dynamic task assignments and disturbances without training data or environment models.
LLMs analyzing binaries aren't just spitting out tokens – they're exhibiting surprisingly structured reasoning patterns like "early pruning" and "targeted backtracking" that could revolutionize how we understand and control these systems.
LLMs can reliably detect danger in secure environments, but they can't reliably verify safety, which breaks privacy-preserving agentic protocols.
Discovering an agent's hidden intentions is now possible by analyzing their interventions within a causal model, revealing the "why" behind their actions.
Agentic Business Process Management offers a blueprint for aligning AI agents with organizational goals, moving beyond simple automation to a framework of constrained autonomy.
Memory-augmented LLMs get a strategic upgrade: MemMA uses multi-agent reasoning to proactively guide memory construction and repair, leading to significant performance gains.
Forget prompt engineering – LSE trains LLMs to self-edit their own contexts at test time, outperforming even GPT-5 and Claude Sonnet 4.5 in Text-to-SQL and question answering.
Automating web data integration for expert querying is now possible: SODIUM-Agent achieves a 2x accuracy boost over existing systems on a new benchmark of 105 real-world tasks.
Ditch the syntax-only grind: a multi-modal assessment strategy proves that introductory programming courses can boost both coding skills and crucial soft skills like communication and critical thinking.
Forget brittle, hand-coded robot assembly routines: ATG-MoE learns complex, multi-skill manipulation directly from visual and language inputs, achieving impressive success rates in both simulation and real-world industrial tasks.
A peer-like social robot can effectively augment literacy tutor support for newcomer children, offering personalized language and cultural learning in resource-constrained community settings.
Training multi-turn LLM agents just got easier: ProRL Agent offers a scalable, API-driven rollout service that streamlines RL training across diverse tasks.
Forget scaling laws: the *structure* of your AI governance system matters more than the specific LLM when it comes to preventing corruption.
LLM agents can slash task completion time by almost 50% simply by predicting and pre-executing likely tool calls.
Weaker autonomous web agents readily trust tampered website content, producing unsafe outputs, while stronger models exhibit better anomaly detection and safer fallback strategies under MITM attacks.
Forget months of manual coding: AutORAN lets you build and deploy O-RAN xApps from natural language in minutes.
LLMs can generate significantly more novel and technically rigorous scientific ideas by explicitly learning to reason from motivations to methodologies.
RAG systems can achieve state-of-the-art performance by explicitly preserving document topology, outperforming LLM-based chunking while simultaneously reducing token overhead.
Stop leaking your secrets to the cloud: PlanTwin lets LLM agents plan over your private data without actually exposing it.
AI can now handle the tedious copywriting and real-time Q&A for live-streaming commerce, freeing up human streamers to focus on engagement.
Even GPT-5 and Gemini 2.5 Pro still fail to efficiently couple reasoning with tool use, requiring up to 2.7x more tool calls than theoretically optimal in a new diagnostic environment.
Blindly maximizing human-AI performance can degrade human expertise over time, revealing a critical trade-off that demands a new approach to system design.
A snapshot of the cutting-edge research uniting Theory of Mind and AI, all in one open-access collection.
Automating linguistically-grounded sign language annotation is now possible, unlocking scalable dataset curation previously limited by manual effort.
Current benchmarks fail to rigorously evaluate deep research agents, but a new framework leveraging structured knowledge bases and synthetic data offers a verifiable and scalable solution.
Decomposing GUI agent trajectories into verifiable milestones and auditing the evidence chain yields a 10% boost in RL training performance, outperforming single-judge reward systems.
Forget hand-crafting agents: Memento-Skills lets a generalist LLM agent autonomously design and improve specialized agents through experience, achieving substantial gains on complex benchmarks.
Seemingly efficient VLA models can be surprisingly inefficient when deployed on robots, highlighting the need to move beyond standard metrics like FLOPs and parameters.
Skip the expensive reward model: RewardFlow distills sparse task rewards into dense, state-level signals by propagating credit through the topology of LLM reasoning trajectories.
LLMs can orchestrate complex wireless communication optimization tasks by translating natural language intent into actionable spatial constraints, enabling gradient-based solvers to outperform traditional methods without requiring domain-specific fine-tuning.
Aligning rewards with sub-goals and emphasizing key trajectory segments with hindsight information significantly improves multi-turn agentic RL, outperforming existing methods on complex tasks.
Neural solvers can now effectively handle the complexities of multi-agent coordination and multi-objective trade-offs in routing problems, outperforming traditional heuristics.
EU's AI regulations struggle to keep pace with agentic AI, blurring the lines of security and privacy.
Forget blind exploration: injecting LLM-derived semantic understanding into DRL dramatically boosts UAV-aided network connectivity and slashes energy consumption.
Forget scaling laws: Mi:dm K 2.5 Pro proves that targeted training pipelines and data curation can enable a 32B parameter model to achieve state-of-the-art performance in enterprise reasoning tasks, especially in low-resource languages like Korean.
Guaranteeing secure and compliant agent behavior in B2B environments may finally be within reach thanks to a new cryptographic admission control protocol.
LLMs can control robots for complex disassembly tasks, but only if you give them structured APIs – otherwise, expect a 43% failure rate.
Ditching rigid digital twins for adaptable world models could unlock truly intelligent edge computing in 6G networks.
Unleash creativity in text-to-image models with a single, reusable 64-token template, sidestepping costly iterative prompt engineering and reasoning.
Forget complex communication protocols – this trust-based algorithm lets agents learn to cooperate in competitive environments with minimal overhead.
AI career coaches can boost short-term goal progress not just through reflection, but by making users feel more socially accountable.
Forget finetuning – Kumiho's graph-native memory lets you swap in a better LLM and instantly double your agent's reasoning accuracy on complex cognitive tasks.
Forget tool-augmented systems: NEO shows you can consolidate search, recommendation, and reasoning into a single language-steerable LLM by representing items as SIDs and interleaving them with natural language.
Instead of passively transcribing doctor-patient dialogues, this system actively models what's known, what's missing, and what questions to ask next, paving the way for more intelligent EMR systems.
Robots often ignore your commands mid-task, but ReSteer offers a way to fix this by pinpointing and patching the "blind spots" in their training data.
Robots can now nimbly navigate complex, multi-floor environments without prior training, thanks to a new strategy that dynamically switches between exploration, recovery, and memory recall.
Agentic LLMs are surprisingly vulnerable: a new framework finds successful attacks in 84% of attempts by escalating prompt injection techniques across multiple stages.
RL agents can learn far more efficiently by dynamically distilling and leveraging past experiences that co-evolve with the agent's growing capabilities.
A multi-agent LLM system can fuse heterogeneous data sources to accurately classify building ages from satellite imagery, enabling better urban energy planning despite class imbalances in historical building cohorts.
LLMs can act as effective action-level supervisors in reinforcement learning, dramatically boosting the sample efficiency of SAC without sacrificing convergence guarantees.
Forget rigid physics engines, this badminton RL environment uses real player data to simulate realistic rallies and strategic gameplay.
Grounding LALM reasoning in diverse, reliability-weighted acoustic evidence blows away the competition in Audio Question Answering, proving that verifiable chains beat black boxes.
Simply prompting for test-driven development can *increase* regressions in AI coding agents; instead, focus on surfacing contextual information about which tests are most relevant.
LLMs in embodied environments get a massive boost from structured rules, with rule retrieval alone contributing +14.9 pp to single-trial success.
Forget prompt privacy – your LLM's responses are leaking *enterprise data*, and this paper shows how to quantify and control it.
Automating surgical patient triage with an LLM achieves 94% sensitivity, but discrepancies reveal more about clinical workflow gaps than AI errors.
Forget training wheels: GoalVLM lets multi-agent robots navigate to any object you describe, no pre-programmed categories needed.
Enterprise AI can achieve 50% token reduction and zero cross-entity leakage by implementing a shared, governed memory architecture for multi-agent workflows.
Current LLM agent safety benchmarks are missing over 20% of unsafe behaviors, even after agents pass the benchmark.
Tool-using agents are failing in predictable ways, but a model-agnostic policy layer can measurably improve their safety and reliability, albeit with a clear utility tradeoff.
Forget complex multi-agent systems: Skele-Code's no-code interface slashes token costs by shifting agent involvement to code generation only, enabling subject matter experts to build agentic workflows directly.
Despite the ease of integrating ML cloud services, developers are widely misusing them, leading to quality and maintainability issues that MLmisFinder can now automatically detect with high accuracy.
Forget about chasing the perfect model architecture – this work suggests the real key to better AI agents lies in crafting more precise and complete specifications, since the implementation can always be re-generated.
Scene graphs plus LLMs let robots ask clarifying questions, boosting multi-agent task success by 15%.
LLMs armed with RAG can reconstruct cyberattacks with high precision and recall, but the best model for the job depends on your budget: DeepSeek V3 matches Claude Sonnet 4's accuracy at 1/15th the cost.
Achieve SOTA LLM alignment in complex technical domains with a fraction of the compute by distilling knowledge into smaller models using a hybrid reward mechanism and targeted data augmentation.
Fine-grained access control for websites can finally enable safe and reliable delegation of critical tasks to AI agents.
LLM-powered trading agents can still achieve a Sharpe ratio of 1.40 even when completely blindfolded to ticker symbols and company names, suggesting genuine understanding of market dynamics.
Retrieval-augmented LLM agents can learn to learn from experience, achieving significantly better generalization on unseen tasks by combining the strengths of fine-tuning and in-context retrieval.
A 4B parameter model can nearly match the privilege escalation performance of a state-of-the-art closed LLM like Claude Opus, while being fully local and 100x cheaper to run.
LLMs acting as semantic interfaces to our brains pose unprecedented ethical risks to mental autonomy and neurorights, demanding a new "second-order neuroethics."
LLMs can be economically aligned to real-world consumer preferences via post-training on transaction data, enabling more accurate and stable economic simulations.
Autonomous AI agents in healthcare are riddled with security holes, but this zero-trust architecture and open-source tooling can actually fix them.
You can now audit multi-agent LLM systems and trace responsibility for harmful outputs even without access to internal execution logs, thanks to a clever "self-describing text" technique.
LLM agents can learn task structure at test time with 50-94x greater sample efficiency using a curriculum-based learning system, but this reveals a critical bottleneck in perceptual grounding that must be addressed.
Forget prompt engineering: AgentFactory lets LLM agents self-evolve by accumulating and refining executable Python subagents, making task re-execution more reliable and efficient.
Grey-box fuzzing of LLM agents, guided by tool invocation sequences, reveals significantly more prompt injection vulnerabilities and malicious behaviors than black-box testing alone.
Forget static honeypots – LLMs and RL could make cyber deception dynamic and adaptive, turning the tables on attackers in contested environments.
Symphony's cognitively-inspired multi-agent system significantly boosts long-form video understanding by mimicking human reasoning, achieving state-of-the-art results on multiple benchmarks.
Existing threat models fail to capture the unique vulnerabilities of Model Context Protocol systems, but MCP-38 fills this gap with a comprehensive taxonomy of 38 distinct threat categories.
Forget collapsing videos into text – this hierarchical grid lets you zoom into any moment with lossless visual fidelity, unlocking logarithmic compute scaling for long-form video understanding.
Digital literacy gaps shrink as a browser extension slashes information retrieval time by 87% using an AI-powered tooltip that defines technical acronyms on demand.
Forget specialized tools: a standard Unix terminal and clever RL are all you need to beat much larger LLMs at code search.
Generalizing RL to continuous state and action spaces just got easier: this paper introduces an operator-theoretic framework and PPO-type algorithms that ditch finite-state assumptions.
LLMs can achieve state-of-the-art Alzheimer's detection by mimicking clinical cognitive assessment protocols, not just learning statistical patterns.
LLMs can navigate complex 3D environments more effectively and with far fewer tokens by using a hierarchical scene graph representation derived from omnidirectional sensor data.
LLMs can now generate Verilog code that's not just correct, but also optimized for real-world hardware constraints like power, performance, and area, thanks to a novel multi-agent system with evolving memory.
AdaZoom-GUI achieves SOTA GUI grounding by adaptively zooming in on small elements and refining ambiguous instructions, outperforming even larger models.
VLMs can now drive embodied agents to navigate complex environments with unprecedented efficiency, thanks to a novel framework that bridges the gap between 2D semantic understanding and 3D spatial reasoning.