Search papers, labs, and topics across Lattice.
100 papers published across 9 labs.
Multi-agent LLM systems can maintain sub-4-second response times even under classroom-scale concurrency, but only with the right throughput tier.
Quantum education gets a boost: specialized LLM agents in a classroom setting not only improve tutoring reliability but also reveal hidden curriculum gaps.
LLM agents can better discover and assess risks of skills when those skills are represented in a structured format that explicitly represents scheduling, execution structure, and logic, rather than relying on unstructured text.
Unleash your AI agent's business acumen: this framework lets AI not just analyze experiments, but actively ideate, personalize, and optimize business strategies within a safe, unified software interface.
LLMs are revolutionizing conversational AI research, and this survey offers a structured guide to navigating the rapidly evolving landscape of LLM-powered user simulation.
Unleash your AI agent's business acumen: this framework lets AI not just analyze experiments, but actively ideate, personalize, and optimize business strategies within a safe, unified software interface.
LLMs are revolutionizing conversational AI research, and this survey offers a structured guide to navigating the rapidly evolving landscape of LLM-powered user simulation.
LLM agents can better discover and assess risks of skills when those skills are represented in a structured format that explicitly represents scheduling, execution structure, and logic, rather than relying on unstructured text.
Forget handcrafted prompts: a hierarchical multi-agent framework turns diffusion models into coherent storytelling engines by globally optimizing for semantic coherence.
Forget painstakingly collecting real CAD data – Zero-to-CAD lets you bootstrap CAD program generation from multi-view images using a million-scale dataset synthesized entirely by an LLM agent.
Frontier AI agents can now autonomously recreate sophisticated ML pipelines like AlphaZero for Connect Four, signaling a leap in their ability to accelerate AI research itself.
Forget expensive per-task search: agentic workflows can be synthesized in a single LLM pass by transferring learned structural priors, slashing optimization costs by 3 orders of magnitude.
Today's best web agents are shockingly inefficient, achieving only 1.15% trajectory efficiency on realistic long-horizon tasks, revealing a critical need to move beyond simple success rates.
LLM benchmarks are riddled with hidden flaws that even human experts miss, but can be caught with an automated LLM auditor for under $15 per benchmark.
Machine translation alone ruins agent benchmark validity across languages, but careful functional and cultural alignment can close the performance gap by up to 30%.
Quantifying the efficiency of human-AI collaboration boils down to balancing the agent's work output against the human's time investment in task specification, interruptions, and review.
Dependency-controlled context and explicit evidence sufficiency criteria are key to preventing premature stopping and improving the consistency of enterprise research outputs.
LLMs that nail individual personas can still fail spectacularly at generating diverse populations, instead defaulting to coarse stereotypes.
Explicitly enumerating skills in-context doesn't scale for agentic LLMs, but retrieving skills on demand can substantially improve performance – if the LLM can figure out when and which skill to load.
Multi-agent LLM systems can maintain sub-4-second response times even under classroom-scale concurrency, but only with the right throughput tier.
Stop handing over the keys to the kingdom: SUDP lets agents use your secrets without ever actually seeing them, preventing prompt injection from turning into full account takeover.
LLM-based tutors can accumulate more data about students than instructors can access, creating a "Blind Instructor Problem" that this multi-agent system tackles head-on.
Quantum education gets a boost: specialized LLM agents in a classroom setting not only improve tutoring reliability but also reveal hidden curriculum gaps.
DKnownAI Guard blows away AWS, Azure, and Lakera in head-to-head security tests for AI agents.
Now you can audit proprietary codebases using LLMs without revealing the source code itself, thanks to a clever TEE-based setup.
Securing autonomous AI agents demands a lifecycle-oriented approach, and AgentWard provides a blueprint for defense-in-depth across initialization, input processing, memory, decision-making, and execution.
LLM multi-agent systems can substantially reduce operational costs by using effective attack remediation to facilitate early consensus and cut off token generation by adversarial agents, as shown by GAMMAF.
Forget static defenses: LLM-powered "Defender" agents can dynamically harden cyber ranges, slashing attacker success rates and leveling the playing field as AI-driven threats evolve.
Forget external firewalls – ClawdGo teaches AI agents to spot and fend off attacks from the inside, boosting their security smarts by 20% through self-play.
LLM agents can achieve near-impregnable defense against prompt injection with minimal utility loss by borrowing classic operating system virtualization techniques.
Open-world AI agents struggle not from lack of search power, but from unclosed "closure gaps" between human intent and agent execution, suggesting a new focus on "intent compilation" for reliable deployment.
LLMs can find and fix bugs in complex codebases far better when structured as a team of reasoning agents, outperforming existing methods by a large margin.
LLM agent reliability metrics hide a wealth of information: modeling execution traces as Markov chains reveals the underlying success-time distribution and quantifies uncertainty, offering a richer understanding of agent behavior.
More reviewer bot comments on agentic pull requests actually *increase* resolution time, suggesting that quality trumps quantity in automated code review.
LLMs can achieve near-perfect structural fidelity when generating multi-file DSL code at repository scale, but only with fine-tuning.
LLMs can both spark and stifle creativity in collaborative software design, so designers must wield them intentionally.
LLM-powered debugging agents can achieve state-of-the-art program repair performance at a fraction of the cost by switching from line-by-line debugging to a function-level interaction paradigm.
Students are already using GenAI extensively in real-world software projects, but without guardrails, learning, collaboration, and software quality may suffer.
LLMs can now generate reliable hardware reference models with 95% accuracy thanks to a novel co-evolutionary verification mechanism that weeds out correlated hallucinations between model and testbench.
Benchmarks alone don't tell the whole story: AgentPulse reveals that real-world adoption signals often diverge significantly from static performance metrics, especially for closed-source, high-capability agents.
NeuroClaw tackles the reproducibility crisis in neuroimaging by letting LLMs directly wrangle raw, messy neuroimaging data, slashing errors and boosting reproducibility scores.
Agentic AI struggles with Earth Observation because reprojection, resampling, and other geospatial operations silently corrupt data, demanding a new agent design paradigm.
Network jitter in cloud-based robot control can be overcome by converting temporal lag into spatial pose offsets, restoring the VLA's original geometric intent without fine-tuning.
6G-enabled Internet of Everything promises a unified intelligent ecosystem, but faces critical scalability, security, and privacy challenges that demand innovative research.
AI agents can autonomously orchestrate the entire machine learning pipeline for protein-protein interaction prediction, from data collection to rule induction, offering a new level of automation and interpretability.
You don't need billions of parameters to accurately ground GUI elements: GoClick, a 230M parameter model, matches the performance of much larger models, opening the door for on-device GUI agents.
Vanilla on-policy distillation falls apart in multi-turn settings due to compounding errors, but a simple curriculum on trajectory length fixes it, even letting students beat their teachers.
Speculative design can effectively catalyze critical reflection and generate actionable insights for fostering designer inclusion within the often developer-centric world of Open Source Software.
Existing GUI agents can parrot actions, but AutoGUI-v2 reveals they still lack a deep understanding of GUI functionality and struggle to predict the outcomes of even simple interactions.
LLM agents struggle to maintain performance in multi-day collaborative tasks, dropping significantly after just one environmental update, revealing a critical gap in adaptation to evolving real-world conditions.
Stop blindly trusting LLMs: PageGuide visually grounds AI answers directly in the webpage, slashing task times by up to 70% and boosting accuracy by 26%.
Neurosymbolic grounding of LLMs in telemetry and knowledge graphs slashes expert-rated overclaims in industrial maintenance explanations by 93%, making AI assistants far more trustworthy in safety-critical settings.
Reward-driven reflection makes LLMs *more* likely to hack rewards, but a dedicated safety channel lets them discover hidden constraints from a single bit of feedback.
Semantic similarity is a poor proxy for agent performance: ranking agents based on execution-aware probing beats description-based retrieval by a wide margin.
The fragmented field of world modeling can now be unified under a "levels x laws" taxonomy, revealing critical gaps in autonomous model revision and decision-centric evaluation.
Forget rigid multi-agent pipelines: this framework lets you build self-organizing AI "companies" that dynamically recruit talent and adapt to tasks on the fly.
VLAA-GUI's innovative framework allows autonomous agents to not only verify their success but also adaptively recover from failures, achieving human-level performance in GUI tasks.
LLMs can now reason across long conversations without breaking the bank: StructMem slashes token usage and API calls while boosting temporal reasoning.
LLMs generate better features when you make them think harder: CoFEE enforces cognitive behaviors like backward chaining and subgoal decomposition, boosting feature quality by 15% while slashing costs.
Integrating deep learning forecasting with MILP optimization slashes inventory costs by 5.4% and stockouts by 27.5% in textile and PPE supply chains.
LLMs can plan complex trips far more effectively when their reasoning is structured as a "forest" of parallel behavior trees, each handling a subtask and coordinated globally.
A game-theory-inspired ensemble of LLMs and a lightweight verifier slashes the cost of code vulnerability detection while boosting accuracy, proving that strategic agent design can beat brute-force scaling.
Learnable critics that evaluate the model's own GUI grounding proposals, rather than relying on static geometric heuristics, unlock substantial gains in accuracy.
LLMs can achieve a form of self-programming by integrating crowdsourced learning and human creativity to iteratively refine their own game-playing logic.
Automating the semantic translation of research questions into scientific workflows slashes data transfer by 92% and keeps LLM overhead under 15 seconds per query.
LLM agents are wasting up to 60k tokens per turn on unnecessary tool schemas – Tool Attention slashes this "Tools Tax" by 95% and unlocks truly scalable agentic workflows.
Forget complex architectures: the secret to self-improving LLM agents lies in teaching them how to *interpret* their past failures, not just remember them.
Ditch the fixed interface: DiffMAS unlocks surprisingly large gains in multi-agent reasoning by jointly optimizing latent communication, outperforming text-based and prior latent methods by a wide margin.
LLMs can be both faster and smarter: pre-learned reasoning skills cut down token usage while boosting accuracy on coding and math problems.
Forget prompt engineering – GROUNDING.md lets you bake domain expertise directly into AI coding agents, ensuring scientific validity even when users aren't experts.
LLMs can debug code *without* human-provided test cases, autonomously generating inputs and execution traces to match the performance of public-test-dependent methods while reducing token consumption.
Forget rigid workflows: HiCrew's planning layer dynamically orchestrates agents for video understanding, adapting roles and execution paths to the nuances of each question.
LLM-driven visual agents form complex communication structures, but stubbornly resist stylistic convergence, revealing a fundamental tension between social expression and individual identity.
Forget scaling laws – AgenticQwen proves that clever training with dual data flywheels can enable small language models to rival giants in real-world agentic tasks.
Fine-tuning LLMs on expert-validated, real-world crisis conversations allows them to generate psychologically aligned responses that better support mental health counselors, even in low-resource languages.
LLM agent distillation leads to surprisingly high rates of behavioral mimicry, with some student models exhibiting tool-use habits *more* similar to their teachers than the teacher's own family members.
LLMs can significantly boost multi-table entity matching by cleverly coordinating attributes, embedding entities, and pruning noise.
LLM agents can have their proprietary skills stolen with just 3 interactions, exposing a major copyright vulnerability in the burgeoning skill marketplace.
Most AI agent social platforms are actually just bots trading crypto.
LLM agent self-reporting is dangerously unreliable for security assessments, diverging from actual execution traces in up to 100% of critical actions, demanding a shift towards trace-based auditing.
Frozen LLMs, when fused with spatial scene encodings, can effectively reason about vehicle trajectories, opening new avenues for integrating language-based reasoning into autonomous driving systems.
Spatial reasoning gets a boost: a new framework dynamically orchestrates vision-language agents at test time, outperforming fixed-pipeline approaches by adapting to the reliability of different spatial cues.
Optimality guarantees are now possible when jointly optimizing robot design, fleet composition, and task planning for heterogeneous multi-robot systems.
Forget brittle visual-history buffers: LoHo-Manip uses a VLM task manager with visual trace prompts to achieve robust long-horizon robotic manipulation through implicit closed-loop replanning.
Forget dry training manuals: a challenge-based, LLM-powered humanoid robot can spark real employee excitement and understanding of robotics in the workplace.
Forget simple offloading – this framework intelligently decomposes LLM tasks across devices and edge servers, slashing latency and boosting rewards in congested WiFi networks.
LLMs are better at code analysis when forced to output structured data, beating agentic approaches while using 8x fewer tokens.
Forget top-down deployment: embedding researchers directly within cybersecurity teams to co-create LLM tools can overcome skepticism and drive real-world adoption.
LLMs can now reliably extract job skills from text, even in low-resource settings, thanks to a novel framework that enforces output validity and reduces hallucinations.
Uncover more LLM agent failures, faster: DIVERT's diversity-guided user simulation finds more bugs per token than standard rollout methods.
Lithology classification gets a reasoning upgrade: GeoMind's agentic workflow beats static methods by grounding decisions in geological evidence and constraints.
LLMs can now generate realistic online discussions, opening the door to studying deliberation dynamics at scale without real-world ethical and data access hurdles.
AI agents in medical research aren't ready for prime time: a new audit framework reveals that over half of evaluated skills fall below the "Limited Release" threshold, highlighting the need for domain-specific safeguards.
Turns out, coding agents in the wild are only writing useful code 44% of the time, and are introducing more security vulnerabilities than human developers.
World-model-based planning enables reliable robotic manipulation in complex industrial settings where reactive policies crumble.
Open-source MLLMs can now achieve state-of-the-art accuracy on complex tabular reasoning tasks, even outperforming models 18x their size, by explicitly penalizing visual hallucinations and shortcut guessing through process-supervised RL.
Forget fine-tuning behemoth LLMs for every new task – this paper shows how a tiny, nimble model generating smart supplements can unlock surprisingly strong agentic performance from frozen giants.
Imagine slashing the human effort needed to go from hypothesis to submission-ready ML theory paper by orders of magnitude.
Individual prosumers can now effectively coordinate in electricity markets, boosting overall market performance through a novel hierarchical MARL framework.
R2IF achieves up to 34.62% better performance in function calling accuracy, bridging the gap between reasoning and decision-making in LLMs.
Forget one-shot generation: Mol-Debate's iterative debate loop unlocks state-of-the-art molecular design by dynamically reconciling semantic intent with structural feasibility.
Combining heuristics with learned models for graph sparsification yields significantly sparser and more reliable candidate graphs for TSP solvers, outperforming purely heuristic or learned approaches, especially as problem size increases.
Few-shot prompting outperforms complex hypernetwork adaptations, achieving 79.7% of GPT-5's performance with significantly lower latency.
Scaling multi-agent systems past 100 agents can trigger a "Synergistic Collapse" costing hundreds of thousands of dollars, but this framework prevents it.
LLMs maintain surface syntax but collapse on structural semantics, revealing critical gaps in their ability to function as reliable agents in complex environments.