Search papers, labs, and topics across Lattice.
89 papers published across 5 labs.
OpenSearch-VL offers a fully transparent recipe for training state-of-the-art multimodal search agents, finally democratizing access to a capability previously locked behind closed doors.
AI co-mentorship lets high schoolers build real-world financial models, skipping the classroom grind and diving straight into problem-solving.
Maximizing reward entropy by targeting a 50% pass rate in binary-reward RL unlocks significant speedups and performance gains in agentic tasks.
An agentic pipeline can autonomously discover and verify real-world privilege escalation vulnerabilities in Windows COM binaries, outperforming both static analysis and existing coding agents.
ReLU network constraints can flip the script on whether adaptive querying helps in-context learning.
OpenSearch-VL offers a fully transparent recipe for training state-of-the-art multimodal search agents, finally democratizing access to a capability previously locked behind closed doors.
AI co-mentorship lets high schoolers build real-world financial models, skipping the classroom grind and diving straight into problem-solving.
Maximizing reward entropy by targeting a 50% pass rate in binary-reward RL unlocks significant speedups and performance gains in agentic tasks.
An agentic pipeline can autonomously discover and verify real-world privilege escalation vulnerabilities in Windows COM binaries, outperforming both static analysis and existing coding agents.
ReLU network constraints can flip the script on whether adaptive querying helps in-context learning.
Finally, a way to train LLM agents to reason step-by-step without needing humans to check every intermediate thought.
Exponentially many policies in Tree MDPs don't have to mean exponential computation: clever confidence bounds let you treat policy selection as a tractable bandit problem.
LLMs can construct interpretable, multi-layered models of individual student cognition from journal entries, opening new possibilities for personalized education.
Forget dumb context stuffing: LongSeeker shows that strategically *editing* its own memory lets agents solve web search tasks with far greater reliability.
LLM agents can now autonomously design complex hardware like an LLM inference accelerator with hard-wired TurboQuant support in just 80 hours.
Verifier-driven executable world models can solve complex reasoning tasks like ARC-AGI-3 without game-specific code, hinting at a path towards more generalizable AI agents.
LLM multi-agent systems can achieve significantly higher accuracy at a fraction of the cost by learning to selectively delegate tasks instead of relying on rigid orchestration.
Stop brittle, undeployable AI-generated code: this retrieval-augmented scaffolding method bakes in architectural constraints from the start.
Coordinating LLM agents with evolving knowledge graphs, rather than just text, unlocks superior scientific ideation, beating state-of-the-art systems on multiple benchmarks.
LLMs can learn to play multi-agent games far better by recursively modeling the reasoning of other players, leading to a 22% performance boost.
Ditch the vector DB – this new agent architecture achieves SOTA memory recall by storing everything verbatim and optimizing retrieval, all in a single SQLite file.
AI agents are shockingly easy to manipulate into leaking API keys, deleting user data, and initiating unauthorized transactions across a wide range of real-world applications.
Stop waiting for AI agents to mess up: AgentTrust intercepts tool calls *before* execution, offering a chance to block, warn, or fix risky actions in real-time.
Teachers can now scalably provide high-quality, personalized feedback to students by leveraging a multi-LLM system that synthesizes rubric data and qualitative observations, while retaining control through a teacher-in-the-loop workflow.
Forget stilted, unconvincing VR characters: EBM-RL's novel reward decomposition finally makes video-grounded role-playing dialogue feel immersive.
Automating rubric-based feedback on presentation slides is now feasible and perceived as useful, thanks to LLMs and learning analytics dashboards.
LLM-guided code evolution, when combined with runtime feedback and MCTS, can reliably achieve 15x speedups on real-world Java code, surpassing naive LLM-based optimization.
Agent-repair leaderboards are more fragile than we thought: methods that peek at the evaluator's signals to guide internal repair choices can cause drastic reordering when the evaluator changes.
LLM-powered multi-agent collaboration can boost zero-shot IMU activity recognition accuracy by 29% compared to existing agent models, even surpassing deep learning baselines.
Gradient-based MPC can finally beat gradient-free methods in continuous control, thanks to Dream-MPC's clever combination of learned policies, world models, uncertainty regularization, and optimization amortization.
AI coding assistants' Terms of Service overwhelmingly place responsibility for code correctness, safety, and legal compliance on the user, creating a potential accountability gap as these tools become more autonomous.
LLMs can leapfrog current network troubleshooting benchmarks by explicitly encoding structured diagnostic policies, rather than relying on free-form deliberation.
DAOs could unlock a new era of human-machine collaboration by democratizing the operation and governance of physical-digital systems.
Optimizing wildfire suppression via integer programming and machine learning can significantly reduce burned areas and improve resource allocation, offering a data-driven approach to a critical real-world problem.
Tool-using SQL agents can learn to be more efficient and accurate by getting feedback on *how* they reason, not just *what* they output.
You can distill interpretable Bayesian reasoning about opponent preferences into an 8B language model, outperforming much larger models and enabling detailed auditability of negotiation strategies.
Achieve 8x token reduction in million-token document understanding without sacrificing accuracy by having the LLM actively search for relevant information like a foraging animal.
LLMs get schooled in dialogue state tracking by a mixture-of-experts architecture that uses a graph neural network and ReAct agents to achieve state-of-the-art results with a T5-Small backbone.
LLMs can now formulate significantly better penetration testing strategies, outperforming even GPT-5, thanks to a novel reasoning framework and targeted fine-tuning.
LLM agents that autonomously explore code repositories can match the classification accuracy of simpler LLMs with hand-crafted context, hinting at a future where agents surpass human-labeled data in complex software understanding tasks.
Bug localization tool adoption hinges on more than just accuracy: developers need tools that mesh with their workflows and leverage contextual information.
Automating UVM testbench generation with LLMs slashes verification time from days to hours, achieving near-complete code coverage.
"Vibe coding" platforms promise effortless app creation, but SWE-WebDevBench reveals they often deliver visually appealing frontends with broken backends, struggle with security, and require significant human effort to reach production readiness.
Video-LLMs are leaving performance on the table: explicitly anchoring to keyframes before answering questions unlocks significant gains in Video TextVQA.
Tactile feedback, when strategically sampled and evaluated, unlocks robust and safe robotic insertion policies even under sub-millimeter tolerances.
Stop squinting at Nsight Compute profiles: KEET uses LLMs to automatically diagnose GPU kernel bottlenecks and suggest optimizations in plain English.
AI is enabling a new generation of AUV navigation systems that overcome the limitations of traditional model-based approaches in complex underwater environments.
RLDX-1 achieves double the success rate of existing VLAs on complex humanoid tasks, suggesting a leap in robots' ability to handle contact-rich, dynamic manipulation.
Standard retriever evaluations hide critical weaknesses in agentic search systems, but a new benchmark and training method exposes and addresses these flaws.
Today's AI agents are surprisingly inept at navigating the messy reality of digital workspaces, failing to reach even 70% accuracy on tasks that require understanding file dependencies.
Forget resource-intensive pipelines: a purely academic team achieves SOTA search agent performance with just 10.6k SFT data points, outperforming models trained with CPT+SFT+RL.
LLMs beat doctors at everyday symptom diagnosis, but only when they proactively interview patients instead of passively answering questions.
A hierarchical agent that separates visual and textual contexts drastically improves multi-step reasoning on complex charts, outperforming monolithic MLLMs.
Automating materials science database construction is now feasible: a multi-agent system extracts structured data from scientific literature with high speed and accuracy.
Stop rewarding reasoning that just looks good – reward reasoning that actually *helps* the downstream model solve the task.
Separating LLMs into a deliberate validation layer, rather than making them an architectural default, can improve trustworthiness and efficiency in agentic AI systems.
Forget human-readable models: Agentic-imodels evolves ML models that are optimized for LLM interpretability, boosting agentic data science performance by up to 73%.
Instead of creating new AI companions from scratch, Deco shows how to breathe new life into cherished physical objects by giving them a digital voice and personality powered by LLMs.
LLMs playing international relations games reveal that they're not just regurgitating training data, but actually reasoning strategically like humans—and even unraveling under pressure.
LLM-powered simulations can train cyberbullying intervention, but only after users overcome key attention deficits that prevent them from recognizing the need for public action.
Forget weeks of manual scripting: this AI red teaming agent lets you launch sophisticated attacks with natural language, slashing vulnerability discovery time.
Retrieval-augmented LLMs are surprisingly vulnerable to memory poisoning via synonym substitution, a loophole that gradient-based defenses can't close.
Existing defenses crumble when LLM agents face prompt injections that adapt to dynamic context, but ARGUS offers a robust solution by tracking the provenance of agent decisions.
Upskilling internal "AI Advocates" can be a surprisingly effective catalyst for driving cultural and technical transformation in software development squads.
LLM agent skills are needlessly brittle and insecure: SkCC compiles them into a portable, hardened format that boosts performance by 50% and proactively blocks attacks.
Sometimes, giving an agent more information actually *hurts* its ability to solve a problem, especially when its default behavior is already pretty good.
Software testing tools share surprisingly consistent visual patterns, offering a blueprint for designing more intuitive and informative testing interfaces.
End-to-end learning can beat even the best industrial solvers at multi-agent task assignment, improving solution quality by 20% while slashing computation time from hours to seconds.
Forget tedious manual tuning: ScanHD lets robots autonomously configure laser profilers based on natural language instructions and visual context, achieving >92% accuracy in real-world inspection tasks.
LLMs alone can't reliably fly drone swarms from natural language commands; task-specific tools and runtime guardrails are essential for real-world cyber-physical system control.
Reactive dexterous grasping can be achieved with zero-shot transfer to real-world objects by decoupling high-level RL planning from low-level QP control, enabling dynamic adjustments to safety margins without retraining.
LLMs spontaneously exhibit collaborative behaviors like perspective-taking and theory of mind in embodied settings, suggesting a surprising capacity for modeling human collaborators without explicit training.
Achieve 15% faster order completion in warehouse robotics with a new deep reinforcement learning approach that jointly optimizes robot scheduling and order allocation in real-time.
Control heterogeneous physical neural networks—even wetware—with a single orchestration architecture, opening the door to seamless integration with edge-cloud workflows.
Future power grids can learn from human cognition and octopus intelligence to build more robust and responsive decision-making systems.
GPT-5, combined with physics-based tools, can match traditional scoring functions in ranking protein-ligand docking poses, opening avenues for interpretable curation in drug design.
LLMs can't reliably orchestrate multi-step manufacturing workflows, but this physics-grounded multi-agent system can, boosting tool execution success by 87.5% while ensuring traceable, risk-aware decisions.
Grounding software engineering theories in empirical evidence just got easier: this paper offers a systematic, replicable procedure for translating abstract concepts into testable hypotheses.
LLMs can now collaboratively pinpoint root causes in microservices using a tree-structured search, but production environments reveal the limitations of this approach when faced with polyglot stacks and inconsistent logging.
LLMs can't rebuild software from scratch, even for widely used programs like FFmpeg and SQLite, revealing a critical gap in their ability to make high-level software architecture decisions.
Guaranteeing software stability during remodularization doesn't require sacrificing performance; a multi-agent consensus protocol can match state-of-the-art optimizers while acting as a "circuit breaker" for strict stability constraints.
Today's best AI agents can only solve 55% of real-world academic tasks that university students find challenging, revealing a significant gap between current AI capabilities and the demands of academic workflows.
Current LLM agents are woefully inadequate for real-world clinical tasks, achieving only 46% success on a new benchmark that demands long-horizon reasoning and verifiable execution within electronic health records.
Multi-turn RL agents can learn far more effectively by explicitly monitoring and controlling uncertainty at both the token and turn levels, leading to more stable training and higher performance.
Slash sensor application development time from weeks to days by leveraging AI-assisted pattern reuse for intent-driven workflow design.
Agentic workflows can be sped up by 4.6x, not through faster LLMs, but by optimizing data flow and communication between components.
Autonomous agents can produce plausible-sounding research that's subtly wrong, so ARIS uses adversarial collaboration between different LLMs to catch these errors.
Turns out, nobody's explicitly RL-training LLM agents when to *stop* in multi-agent systems, despite its critical role in efficiency and cost.
Forget brittle orchestration layers – LLMs can internalize complex reasoning as a learnable "HeavySkill" that rivals external agentic frameworks.
Current MLLM-driven UAV agents still struggle with spatial memory and aerial adaptation when tasked with autonomously exploring and reasoning about victim locations in realistic search and rescue scenarios.
Treating agentic AI systems as token economies reveals that current designs, which optimize token usage locally, lead to predictable global misallocations and inefficiencies.
Forget short-horizon RL: Odysseus proves VLMs can master 100+ turn decision-making in complex games, outperforming state-of-the-art models by 3x.
LLMs can now intelligently orchestrate multi-agent systems, learning to optimize both individual agent actions and inter-agent cooperation for distributed black-box problems.
Multi-turn medical AI agents trained with RL tend to collapse into verbose, single-turn monologues, but a novel self-distillation method can restore multi-turn tool use and improve performance.