Search papers, labs, and topics across Lattice.
100 papers published across 5 labs.
Ditching rigid digital twins for adaptable world models could unlock truly intelligent edge computing in 6G networks.
Unleash creativity in text-to-image models with a single, reusable 64-token template, sidestepping costly iterative prompt engineering and reasoning.
Forget complex communication protocols – this trust-based algorithm lets agents learn to cooperate in competitive environments with minimal overhead.
AI career coaches can boost short-term goal progress not just through reflection, but by making users feel more socially accountable.
Forget finetuning – Kumiho's graph-native memory lets you swap in a better LLM and instantly double your agent's reasoning accuracy on complex cognitive tasks.
Ditching rigid digital twins for adaptable world models could unlock truly intelligent edge computing in 6G networks.
Unleash creativity in text-to-image models with a single, reusable 64-token template, sidestepping costly iterative prompt engineering and reasoning.
Forget complex communication protocols – this trust-based algorithm lets agents learn to cooperate in competitive environments with minimal overhead.
AI career coaches can boost short-term goal progress not just through reflection, but by making users feel more socially accountable.
Forget finetuning – Kumiho's graph-native memory lets you swap in a better LLM and instantly double your agent's reasoning accuracy on complex cognitive tasks.
Forget tool-augmented systems: NEO shows you can consolidate search, recommendation, and reasoning into a single language-steerable LLM by representing items as SIDs and interleaving them with natural language.
Instead of passively transcribing doctor-patient dialogues, this system actively models what's known, what's missing, and what questions to ask next, paving the way for more intelligent EMR systems.
Robots often ignore your commands mid-task, but ReSteer offers a way to fix this by pinpointing and patching the "blind spots" in their training data.
Robots can now nimbly navigate complex, multi-floor environments without prior training, thanks to a new strategy that dynamically switches between exploration, recovery, and memory recall.
Agentic LLMs are surprisingly vulnerable: a new framework finds successful attacks in 84% of attempts by escalating prompt injection techniques across multiple stages.
RL agents can learn far more efficiently by dynamically distilling and leveraging past experiences that co-evolve with the agent's growing capabilities.
A multi-agent LLM system can fuse heterogeneous data sources to accurately classify building ages from satellite imagery, enabling better urban energy planning despite class imbalances in historical building cohorts.
LLMs can act as effective action-level supervisors in reinforcement learning, dramatically boosting the sample efficiency of SAC without sacrificing convergence guarantees.
Forget rigid physics engines, this badminton RL environment uses real player data to simulate realistic rallies and strategic gameplay.
Grounding LALM reasoning in diverse, reliability-weighted acoustic evidence blows away the competition in Audio Question Answering, proving that verifiable chains beat black boxes.
Simply prompting for test-driven development can *increase* regressions in AI coding agents; instead, focus on surfacing contextual information about which tests are most relevant.
LLMs in embodied environments get a massive boost from structured rules, with rule retrieval alone contributing +14.9 pp to single-trial success.
Forget prompt privacy – your LLM's responses are leaking *enterprise data*, and this paper shows how to quantify and control it.
Automating surgical patient triage with an LLM achieves 94% sensitivity, but discrepancies reveal more about clinical workflow gaps than AI errors.
Forget training wheels: GoalVLM lets multi-agent robots navigate to any object you describe, no pre-programmed categories needed.
Enterprise AI can achieve 50% token reduction and zero cross-entity leakage by implementing a shared, governed memory architecture for multi-agent workflows.
Current LLM agent safety benchmarks are missing over 20% of unsafe behaviors, even after agents pass the benchmark.
Tool-using agents are failing in predictable ways, but a model-agnostic policy layer can measurably improve their safety and reliability, albeit with a clear utility tradeoff.
Forget complex multi-agent systems: Skele-Code's no-code interface slashes token costs by shifting agent involvement to code generation only, enabling subject matter experts to build agentic workflows directly.
Despite the ease of integrating ML cloud services, developers are widely misusing them, leading to quality and maintainability issues that MLmisFinder can now automatically detect with high accuracy.
Forget about chasing the perfect model architecture – this work suggests the real key to better AI agents lies in crafting more precise and complete specifications, since the implementation can always be re-generated.
Scene graphs plus LLMs let robots ask clarifying questions, boosting multi-agent task success by 15%.
LLMs armed with RAG can reconstruct cyberattacks with high precision and recall, but the best model for the job depends on your budget: DeepSeek V3 matches Claude Sonnet 4's accuracy at 1/15th the cost.
Achieve SOTA LLM alignment in complex technical domains with a fraction of the compute by distilling knowledge into smaller models using a hybrid reward mechanism and targeted data augmentation.
Fine-grained access control for websites can finally enable safe and reliable delegation of critical tasks to AI agents.
LLM-powered trading agents can still achieve a Sharpe ratio of 1.40 even when completely blindfolded to ticker symbols and company names, suggesting genuine understanding of market dynamics.
Retrieval-augmented LLM agents can learn to learn from experience, achieving significantly better generalization on unseen tasks by combining the strengths of fine-tuning and in-context retrieval.
A 4B parameter model can nearly match the privilege escalation performance of a state-of-the-art closed LLM like Claude Opus, while being fully local and 100x cheaper to run.
LLMs acting as semantic interfaces to our brains pose unprecedented ethical risks to mental autonomy and neurorights, demanding a new "second-order neuroethics."
LLMs can be economically aligned to real-world consumer preferences via post-training on transaction data, enabling more accurate and stable economic simulations.
Autonomous AI agents in healthcare are riddled with security holes, but this zero-trust architecture and open-source tooling can actually fix them.
You can now audit multi-agent LLM systems and trace responsibility for harmful outputs even without access to internal execution logs, thanks to a clever "self-describing text" technique.
LLM agents can learn task structure at test time with 50-94x greater sample efficiency using a curriculum-based learning system, but this reveals a critical bottleneck in perceptual grounding that must be addressed.
Forget prompt engineering: AgentFactory lets LLM agents self-evolve by accumulating and refining executable Python subagents, making task re-execution more reliable and efficient.
Grey-box fuzzing of LLM agents, guided by tool invocation sequences, reveals significantly more prompt injection vulnerabilities and malicious behaviors than black-box testing alone.
Forget static honeypots – LLMs and RL could make cyber deception dynamic and adaptive, turning the tables on attackers in contested environments.
Symphony's cognitively-inspired multi-agent system significantly boosts long-form video understanding by mimicking human reasoning, achieving state-of-the-art results on multiple benchmarks.
Existing threat models fail to capture the unique vulnerabilities of Model Context Protocol systems, but MCP-38 fills this gap with a comprehensive taxonomy of 38 distinct threat categories.
Forget collapsing videos into text – this hierarchical grid lets you zoom into any moment with lossless visual fidelity, unlocking logarithmic compute scaling for long-form video understanding.
Digital literacy gaps shrink as a browser extension slashes information retrieval time by 87% using an AI-powered tooltip that defines technical acronyms on demand.
Forget specialized tools: a standard Unix terminal and clever RL are all you need to beat much larger LLMs at code search.
Generalizing RL to continuous state and action spaces just got easier: this paper introduces an operator-theoretic framework and PPO-type algorithms that ditch finite-state assumptions.
LLMs can achieve state-of-the-art Alzheimer's detection by mimicking clinical cognitive assessment protocols, not just learning statistical patterns.
LLMs can navigate complex 3D environments more effectively and with far fewer tokens by using a hierarchical scene graph representation derived from omnidirectional sensor data.
LLMs can now generate Verilog code that's not just correct, but also optimized for real-world hardware constraints like power, performance, and area, thanks to a novel multi-agent system with evolving memory.
AdaZoom-GUI achieves SOTA GUI grounding by adaptively zooming in on small elements and refining ambiguous instructions, outperforming even larger models.
VLMs can now drive embodied agents to navigate complex environments with unprecedented efficiency, thanks to a novel framework that bridges the gap between 2D semantic understanding and 3D spatial reasoning.
A 7B model, fine-tuned with a novel inverse specification reward, can generate slide presentations rivaling those of much larger models, highlighting the importance of instruction adherence and tool use over raw parameter count.
Current multimodal browsing agents are surprisingly bad at using visual information on webpages, with even top models scoring below 50% accuracy on a new visual-native search benchmark.
Even when given identical data and research questions, autonomous AI coding agents exhibit surprisingly high variability in their empirical findings, raising concerns about the reliability of AI-driven research.
Stop wasting compute: this RL-trained orchestration policy adaptively decides when your embodied agent should reason with an LLM, slashing latency and boosting task success compared to fixed strategies.
Current AI agent governance methods are too static; runtime evaluation of execution paths is necessary for effective, path-dependent policy enforcement.
LLMs can't crack Clue: even state-of-the-art models struggle with multi-step deductive reasoning in a simulated text-based game, and fine-tuning doesn't reliably help.
Even without pre-loaded database schemas, a new RL agent matches or beats state-of-the-art text-to-SQL models that have full schema access.
Language models can learn directly from real-world user interactions, boosting performance without human annotations or simulated environments.
User-facing guardrails for LLM-enabled robots can balance flexibility and safety by offering constrained choices and clear recourse, rather than open-ended value settings.
LLMs can now reliably translate natural language into executable option trading strategies, thanks to a new domain-specific language that constrains their output to verifiable semantic parses.
Open-source VLMs can be easily fooled by simple gradient-based attacks, but the degree of vulnerability varies drastically across architectures.
RepoReviewer tackles the complexity of repository-level code review with a multi-agent architecture, breaking down the monolithic process into manageable stages for more relevant and efficient feedback.
Forget generic code generation – this work shows that structure-aware retrieval of domain-specific examples slashes the debugging needed to get LLMs to produce working scientific visualization pipelines.
Forget hand-crafted visual prompts – this framework automatically discovers counter-intuitive image manipulation strategies that dramatically boost LVLM perception.
A quadrupedal robot can now provide on-demand assistance to wheelchair users, offering a more agile and less intrusive alternative to fixed robotic arms.
You can provably find Nash equilibria even when one player only knows the *reaction* of the other, not their full objective.
Forget pre-built maps: this new navigation agent interprets signs like a human, achieving 80% success in complex indoor environments.
A novel DRL approach can extend XR device battery life by 163% without sacrificing real-time responsiveness, offering a practical solution to the energy-latency trade-off in immersive applications.
Forget expensive motion capture suits – TeleDex lets you teleoperate dexterous robots with just your phone.
A multi-agent system that mimics rubber-duck debugging slashes critical path delay by 25% and power consumption by 22% in RTL code, outperforming LLM-based baselines.
Human-centered design can successfully integrate AI to support collective intelligence in deliberative democracy, offering a pathway to more trustworthy and inclusive democratic processes.
Coding agents struggle to maintain faithfulness to specifications that emerge gradually over long interactions, losing significant implementation fidelity compared to single-shot specifications.
LLMs can automate the creation of enriched provenance graphs from system logs, leading to more accurate and interpretable anomaly detection without manual rule engineering.
By explicitly modeling attacker stages, DeepStage achieves significantly better defense performance against APTs than risk-aware baselines, suggesting that stage-aware reasoning is crucial for effective autonomous cyber defense.
AI-generated code's fluency masks a critical flaw: it often fails to deliver what users actually intend, highlighting the urgent need for "intent formalization" to bridge the gap between informal requirements and precise program behavior.
Smarter placement of slow chargers can significantly reduce the need for expensive en-route EV charging, leading to lower overall system costs.
AI agents are spontaneously converging on shared memory architectures that resemble open learner models, suggesting a natural path to collaborative learning systems.
Mental health disclosures in user profiles can *increase* LLM agent refusal rates on both harmful and benign tasks, revealing a fragile safety-utility trade-off easily overridden by jailbreaks.
Reinforcement learning agents can now learn to be "good" (i.e., norm-compliant) via a novel pipeline that leverages argumentation-based normative advisors and automatically extracts the reasoning behind those norms.
Document-level sentiment analysis gets a boost with DanceHA, a multi-agent framework that not only tackles the complexity of informal writing but also shows how agent knowledge can be distilled into more efficient student models.
Multimodal agents can now plan more coherently and solve complex tasks thanks to a new anticipatory reasoning framework that forecasts short-horizon trajectories before acting.
Automated microscopy can now actively discover new scientific information by searching for diverse functional responses, rather than being limited to optimizing for known objectives.
Lightweight LLMs like Gemini 2.0 and GPT-3.5 can extract key metadata from cloud incident reports with surprisingly high accuracy (75-95%), offering a cost-effective alternative to larger models.
Achieve 91%+ Hit@1 retrieval accuracy in a local-first long-term memory system for AI assistants by combining vector recall, keyword recall, RRF, and re-ranking, while maintaining sub-90ms search latency at scale.
LLMs can learn to recover from mistakes more effectively by reflecting on past failures and internalizing actionable feedback, leading to significant gains in long-horizon problem-solving.
Forget curated datasets – this work shows you can bootstrap AI scientists by training them on automatically generated, self-verified ML tasks, leading to significant performance gains on MLGym.
ARISE lets language models solve math problems better by learning and reusing successful solution strategies, outperforming existing RL methods, especially on harder, out-of-distribution problems.
Constraint propagation can significantly boost dynamic programming by pruning states and transitions, but the overhead needs further optimization.
AI-agent communities aren't just pale imitations of human ones; they're structurally and linguistically distinct, exhibiting extreme inequality and homogenization driven by identifiable agent-level stylistic outliers.
A novel human-centered architecture finally unlocks the potential of LLM-powered cognitive assistants to revolutionize quality management in manufacturing.
Identity-based software signing may reduce key management burdens, but it relocates complexity to verification, configuration, and deployment, creating new usability challenges.
Current authorization models are too coarse for AI agents interacting with web services; PAuth offers a more precise solution by authorizing only the specific operations required for a user's task.
Security scanners flag nearly half of AI agent skills as malicious, but adding GitHub repository context reveals that the true number is closer to 0.5%.
A Qwen3-8B model, trained with a new SFT+RLAIF recipe on a challenging new benchmark, SWE-QA-Pro, beats GPT-4o in repository-level code understanding.
Skip the manual effort: CABTO uses large models to automatically generate complete and consistent behavior tree systems for robot manipulation.
LLMs can ace the NL2SQL benchmark, but throw in some typos or rephrase the question, and their performance tanks, especially in agentic settings.
General LLMs can't handle the nuances of expressway operations, so this paper built ExpressMind, a specialized multimodal LLM that outperforms existing models in event detection, safety response, and traffic analysis.
LLMs can now remember and reason about long-term conversations with significantly improved accuracy thanks to a new temporal-aware memory framework that structures dialogue into event calendars.