Search papers, labs, and topics across Lattice.
LLM-based autonomous agents, tool-augmented language models, function calling, and agentic workflows.
#10 of 24
1
OpenSearch-VL offers a fully transparent recipe for training state-of-the-art multimodal search agents, finally democratizing access to a capability previously locked behind closed doors.
AI co-mentorship lets high schoolers build real-world financial models, skipping the classroom grind and diving straight into problem-solving.
Maximizing reward entropy by targeting a 50% pass rate in binary-reward RL unlocks significant speedups and performance gains in agentic tasks.
An agentic pipeline can autonomously discover and verify real-world privilege escalation vulnerabilities in Windows COM binaries, outperforming both static analysis and existing coding agents.
ReLU network constraints can flip the script on whether adaptive querying helps in-context learning.
Finally, a way to train LLM agents to reason step-by-step without needing humans to check every intermediate thought.
Exponentially many policies in Tree MDPs don't have to mean exponential computation: clever confidence bounds let you treat policy selection as a tractable bandit problem.
LLMs can construct interpretable, multi-layered models of individual student cognition from journal entries, opening new possibilities for personalized education.
Forget dumb context stuffing: LongSeeker shows that strategically *editing* its own memory lets agents solve web search tasks with far greater reliability.
LLM agents can now autonomously design complex hardware like an LLM inference accelerator with hard-wired TurboQuant support in just 80 hours.
Verifier-driven executable world models can solve complex reasoning tasks like ARC-AGI-3 without game-specific code, hinting at a path towards more generalizable AI agents.
LLM multi-agent systems can achieve significantly higher accuracy at a fraction of the cost by learning to selectively delegate tasks instead of relying on rigid orchestration.
Stop brittle, undeployable AI-generated code: this retrieval-augmented scaffolding method bakes in architectural constraints from the start.
Coordinating LLM agents with evolving knowledge graphs, rather than just text, unlocks superior scientific ideation, beating state-of-the-art systems on multiple benchmarks.
LLMs can learn to play multi-agent games far better by recursively modeling the reasoning of other players, leading to a 22% performance boost.
Ditch the vector DB – this new agent architecture achieves SOTA memory recall by storing everything verbatim and optimizing retrieval, all in a single SQLite file.
AI agents are shockingly easy to manipulate into leaking API keys, deleting user data, and initiating unauthorized transactions across a wide range of real-world applications.
Stop waiting for AI agents to mess up: AgentTrust intercepts tool calls *before* execution, offering a chance to block, warn, or fix risky actions in real-time.
Teachers can now scalably provide high-quality, personalized feedback to students by leveraging a multi-LLM system that synthesizes rubric data and qualitative observations, while retaining control through a teacher-in-the-loop workflow.
Forget stilted, unconvincing VR characters: EBM-RL's novel reward decomposition finally makes video-grounded role-playing dialogue feel immersive.
Automating rubric-based feedback on presentation slides is now feasible and perceived as useful, thanks to LLMs and learning analytics dashboards.
LLM-guided code evolution, when combined with runtime feedback and MCTS, can reliably achieve 15x speedups on real-world Java code, surpassing naive LLM-based optimization.
Agent-repair leaderboards are more fragile than we thought: methods that peek at the evaluator's signals to guide internal repair choices can cause drastic reordering when the evaluator changes.
LLM-powered multi-agent collaboration can boost zero-shot IMU activity recognition accuracy by 29% compared to existing agent models, even surpassing deep learning baselines.
Gradient-based MPC can finally beat gradient-free methods in continuous control, thanks to Dream-MPC's clever combination of learned policies, world models, uncertainty regularization, and optimization amortization.
AI coding assistants' Terms of Service overwhelmingly place responsibility for code correctness, safety, and legal compliance on the user, creating a potential accountability gap as these tools become more autonomous.
LLMs can leapfrog current network troubleshooting benchmarks by explicitly encoding structured diagnostic policies, rather than relying on free-form deliberation.
DAOs could unlock a new era of human-machine collaboration by democratizing the operation and governance of physical-digital systems.
Optimizing wildfire suppression via integer programming and machine learning can significantly reduce burned areas and improve resource allocation, offering a data-driven approach to a critical real-world problem.
Tool-using SQL agents can learn to be more efficient and accurate by getting feedback on *how* they reason, not just *what* they output.
You can distill interpretable Bayesian reasoning about opponent preferences into an 8B language model, outperforming much larger models and enabling detailed auditability of negotiation strategies.
Achieve 8x token reduction in million-token document understanding without sacrificing accuracy by having the LLM actively search for relevant information like a foraging animal.
LLMs get schooled in dialogue state tracking by a mixture-of-experts architecture that uses a graph neural network and ReAct agents to achieve state-of-the-art results with a T5-Small backbone.
LLMs can now formulate significantly better penetration testing strategies, outperforming even GPT-5, thanks to a novel reasoning framework and targeted fine-tuning.
LLM agents that autonomously explore code repositories can match the classification accuracy of simpler LLMs with hand-crafted context, hinting at a future where agents surpass human-labeled data in complex software understanding tasks.
Bug localization tool adoption hinges on more than just accuracy: developers need tools that mesh with their workflows and leverage contextual information.
Automating UVM testbench generation with LLMs slashes verification time from days to hours, achieving near-complete code coverage.
"Vibe coding" platforms promise effortless app creation, but SWE-WebDevBench reveals they often deliver visually appealing frontends with broken backends, struggle with security, and require significant human effort to reach production readiness.
Video-LLMs are leaving performance on the table: explicitly anchoring to keyframes before answering questions unlocks significant gains in Video TextVQA.
Tactile feedback, when strategically sampled and evaluated, unlocks robust and safe robotic insertion policies even under sub-millimeter tolerances.
Stop squinting at Nsight Compute profiles: KEET uses LLMs to automatically diagnose GPU kernel bottlenecks and suggest optimizations in plain English.
AI is enabling a new generation of AUV navigation systems that overcome the limitations of traditional model-based approaches in complex underwater environments.
RLDX-1 achieves double the success rate of existing VLAs on complex humanoid tasks, suggesting a leap in robots' ability to handle contact-rich, dynamic manipulation.
Standard retriever evaluations hide critical weaknesses in agentic search systems, but a new benchmark and training method exposes and addresses these flaws.
Today's AI agents are surprisingly inept at navigating the messy reality of digital workspaces, failing to reach even 70% accuracy on tasks that require understanding file dependencies.
Forget resource-intensive pipelines: a purely academic team achieves SOTA search agent performance with just 10.6k SFT data points, outperforming models trained with CPT+SFT+RL.
LLMs beat doctors at everyday symptom diagnosis, but only when they proactively interview patients instead of passively answering questions.
A hierarchical agent that separates visual and textual contexts drastically improves multi-step reasoning on complex charts, outperforming monolithic MLLMs.
Automating materials science database construction is now feasible: a multi-agent system extracts structured data from scientific literature with high speed and accuracy.
Stop rewarding reasoning that just looks good – reward reasoning that actually *helps* the downstream model solve the task.