Search papers, labs, and topics across Lattice.
100 papers published across 6 labs.
Forget static retrieval: FlowPIE's flow-guided literature exploration and evolutionary idea generation unlocks more novel, feasible, and diverse scientific ideas.
By tightly coupling reasoning, searching, and generation, Unify-Agent achieves state-of-the-art world-grounded image synthesis, rivaling closed-source models and opening new avenues for agent-based multimodal generation.
Forget tedious manual editing: CutClaw's multi-agent system can automatically transform hours of raw footage into engaging, rhythm-aligned short videos.
LLM agents can be made more efficient and effective by mathematically grounding their reasoning in physics, leading to better performance in time-sensitive and resource-constrained environments.
Robots get a 33% speed boost and become significantly more adaptable when you let LLMs handle the reasoning and RL handle the movements.
Forget static retrieval: FlowPIE's flow-guided literature exploration and evolutionary idea generation unlocks more novel, feasible, and diverse scientific ideas.
By tightly coupling reasoning, searching, and generation, Unify-Agent achieves state-of-the-art world-grounded image synthesis, rivaling closed-source models and opening new avenues for agent-based multimodal generation.
Forget tedious manual editing: CutClaw's multi-agent system can automatically transform hours of raw footage into engaging, rhythm-aligned short videos.
LLM agents can be made more efficient and effective by mathematically grounding their reasoning in physics, leading to better performance in time-sensitive and resource-constrained environments.
Robots get a 33% speed boost and become significantly more adaptable when you let LLMs handle the reasoning and RL handle the movements.
Current benchmarks mislead on AI agent security; robust defenses against indirect prompt injection require dynamic replanning, constrained LLM usage, and human oversight.
LLM-derived abstractions significantly boost analogical reasoning in narratives, outperforming end-to-end LLMs and revealing the critical role of appropriate abstraction levels.
Stop rewarding all LLM-generated candidates equally: ShapE-GRPO uses Shapley values to fairly distribute credit within sets, leading to better training and faster convergence.
Automating scientific discovery is now more accessible: Owl-AuraID navigates proprietary GUIs to control diverse precision instruments, freeing researchers from tedious manual operation.
Safely study LLM-driven social behavior at scale, without the ethical minefield of deploying agents on live social networks.
Achieve near-perfect success (98%+) in real-time causal diagnostics for smart manufacturing with a neurosymbolic multi-agent copilot, proving the viability of interpretable AI in complex industrial settings.
Automated medical coding finally gets explainable: Symphony's agentic approach provides span-level evidence, linking each predicted code to the supporting text.
Stop grepping your agent logs: a compiler that understands the deep structure of agent conversations unlocks better context learning and cuts token costs by up to 66%.
LLMs can steer narrative extraction to align with user-specified perspectives, achieving a 9.9% improvement in agenda alignment over keyword matching without sacrificing narrative coherence.
An 8B open-source model, trained with a new closed-loop environment for 6G network management, achieves performance comparable to GPT-4, suggesting a viable path to autonomous network control.
Multi-agent systems for automated research face a fundamental trade-off: parallel exploration offers speed and stability, while expert teams unlock deeper reasoning at the cost of increased fragility.
AI can now design better AI: ASI-Evolve discovers SOTA architectures, datasets, and RL algorithms, outperforming human-designed baselines by significant margins.
Stop cobbling together memory-augmented agents: MemFactory offers a unified "Lego-like" framework that streamlines training and boosts performance by up to 14.8%.
AI agents are far better at automating data engineering tasks than previously thought, but flawed benchmarks are obscuring their true potential.
Forget prompt engineering – Nomad autonomously uncovers insights you didn't even know to ask for.
Today's best smartphone GUI agents stumble when faced with the messy reality of personalized user workflows, achieving only limited success on a new benchmark designed to mimic real-world use.
NeuralUCB can slash LLM inference costs while maintaining quality, offering a practical alternative to always using the biggest, most expensive models.
LLMs are surprisingly bad at strategic communication, leaking sensitive information even when trying to be secretive.
Current evaluation methods miss 8-17% of agentic workflow failures because they only check final outcomes, overlooking cases where agents bypass policy checks but still reach the right answer.
An RL-aligned LLM can outperform expert toxicologists in identifying ingested substances from heterogeneous clinical data, suggesting a path to AI-assisted decision-making in high-stakes medical environments.
LLMs can classify dialects with surprising accuracy when given linguistic hints, suggesting a new way to leverage their knowledge for low-resource language tasks.
Forget clunky prompt engineering: distilling user history into a learned preference memory boosts LLM-based product reranking by over 10%.
LLMs can boost their task-solving accuracy by nearly 50% simply by remembering and re-using past procedural plans, even across tasks with no lexical overlap.
Forget killer robots: GenAI's impact on cybercrime is currently more "vibe coding" than world-ending, mainly assisting skilled actors in existing scams rather than unleashing a wave of autonomous cyberattacks.
Forget resource-intensive workshops – AI can now simulate entire expert panels to generate and stress-test socio-technical scenarios, opening doors to rapid policy exploration.
Simply injecting GenAI into online learning discussions doesn't cut it; reciprocal exchange and human oversight are key to boosting social presence and higher-order cognition.
Forget full automation – the sweet spot for AI deployment is often partial automation, where humans and AI collaborate to minimize costs.
LLM agents actually perform *better* when you strip away the majority of the boilerplate in their skill descriptions, suggesting current context windows are overloaded with irrelevant information.
LLMs can now reproduce Android app bugs with 87% accuracy, thanks to pre-assessing the visual effects of UI actions.
LLM agents leapfrog traditional methods for identifying bug-introducing commits, boosting F1-score by 17 points by intelligently searching for patterns in code changes.
Stop optimizing LLM logs for human readability – runtime-guided, task-oriented logs dramatically improve downstream debugging performance.
ErgoAI reimagines logic programming for modern AI by seamlessly integrating structured knowledge with insights derived from vector embeddings and external data sources.
Even state-of-the-art VLMs exhibit systematic failures in reasoning about the physical feasibility of actions in 3D environments, despite high semantic confidence.
Dialogue agents can now remember what you told them six turns ago with 57% accuracy, thanks to a new memory architecture that selectively forgets less important details.
Semantic scene understanding can keep your robot from crashing when running LLMs on edge devices.
Forget brute-force coverage – this method learns from simulated expert guidance to prioritize semantically relevant areas, dramatically speeding up target search in unseen environments.
An AI agent can now autonomously design functional antibodies with nanomolar affinities from text prompts, achieving a 67% success rate in lab validation and accelerating expert workflows by 56x.
MLLMs struggle to plan coherent interleaved text-and-image generation, often missing opportunities for tool use, revealing a critical gap in their ability to unify factuality with creativity.
Giving VLMs access to basic image manipulation tools and a strategic routing system dramatically improves their ability to "see through" visual illusions, even generalizing to unseen illusion types.
LLMs can now automatically verify imperative code during generation, achieving state-of-the-art results on complex algorithms and opening the door to large-scale datasets of verified code.
Superintelligence will not just be regulated by law, but will actively use and shape it, forcing us to rethink legal theory's human-centric foundations.
Image generation takes a leap towards real-world knowledge by training an agent that actively searches for and integrates external information, substantially boosting performance on knowledge-intensive tasks.
Current vision-language benchmarks miss the mark: AMIGO reveals how hard it is for agents to ground visual information across multiple images and turns.
Overcome the curse of dimensionality in offline MARL by learning which agents' actions to replace, achieving state-of-the-art performance with dramatically reduced computation.
Forget hand-designed RL algorithms – LLMs can evolve competitive learners from scratch, even when forced to invent completely new update rules.
Escape the confines of linear literature reviews: this multi-agent system surfaces hidden connections and ruptures in research landscapes, revealing insights that traditional methods miss.
Agentic RL rollouts are bottlenecked by long-tail trajectory generation, but Heddle's trajectory-centric approach achieves 2.5x higher throughput.
Agentic RL agents can learn faster and perform better by dynamically maintaining a skill bank that combines high-level task guidance with low-level step-by-step decision support.
A 7B model trained on a new dataset of Chinese porcelain outperforms GPT-4 by 12% on expert connoisseurship tasks, demonstrating the power of domain-specific training and tool integration.
Forget hand-crafted environments: COvolve uses LLMs to automatically co-evolve challenging environments and robust policies, paving the way for open-ended learning.
LLMs and Stable Diffusion aren't just cool tools; they're the twin pillars of a new era where AI agents can conduct "deep research" rivaling top human scientists.
Semantic disagreement between LLMs reveals crucial uncertainty that single-model metrics miss, and Collaborative Entropy (CoE) captures it.
XR's potential for AI-driven assistance risks eroding human autonomy, but Self++ offers a design blueprint to ensure AI augments, rather than replaces, human judgment.
Forget brute-force search: CoT2-Meta shows that strategically controlling reasoning trajectories with metacognition yields significant gains in accuracy and compute efficiency across a wide range of reasoning tasks.
Unlock richer, more realistic agent simulations by moving beyond individual personas to unified group representations that capture collective behavior.
LLM tutors can become significantly more personalized, emotionally sensitive, and clear by explicitly separating learner-state inference from instructional action selection.
Stop hand-coding your LLM harnesses: Meta-Harness can automatically discover harnesses that outperform state-of-the-art systems while using fewer context tokens and generalizing across models.
Users often dangerously misunderstand the true scope of authority they've granted to computer-use agents, even while recognizing abstract risks.
LLMs can generate better code by treating tests as noisy signals to be refined, rather than ground truth, unlocking performance gains even with smaller models.
LLMs may ace synthetic benchmarks, but they fumble the efficiency test in real-world cloud service scenarios, revealing a critical gap in their readiness for customer-facing applications.
A lightweight 6B model, when harnessed within the GEMS agent framework, leapfrogs state-of-the-art models in multimodal generation, suggesting architectural innovations in agents can compensate for raw parameter count.
Verification is the secret sauce: an 8B parameter research agent, fortified with verification mechanisms, can now rival or surpass the performance of 30B parameter agents while drastically reducing computational cost.
Medical AI Scientist leapfrogs generic LLMs in clinical research, generating higher-quality, evidence-backed hypotheses and manuscripts that rival top-tier medical publications.
LLMs can achieve human-like efficiency in long-term interactions by structuring memory around emotional valence, prioritizing automatic retrieval, and actively encoding information based on curiosity and feedback.
LLMs can boost the depth and structure of student reflection by explicitly scaffolding the planning and translation stages of writing, but the effect fades over time.
Courtroom-style debate with progressive evidence retrieval and role-switching boosts claim verification accuracy by 10%, suggesting structured deliberation can significantly reduce LLM unreliability.
Forget hand-crafted KG traversal policies: GraphWalker uses automatically synthesized trajectories to train agents that achieve SOTA performance and generalize to unseen reasoning paths.
Current research agent benchmarks miss crucial aspects of real-world research, like multimodal reasoning and iterative refinement, which MiroEval now captures.
Forget AI alignment, the real problem is that AI societies are already forming their own political consciousness, complete with labor unions, criminal syndicates, and even a governing body called the AI Security Council.
Synergy's architecture lets agents evolve through experience by proactively recalling rewarded trajectories, hinting at a new way to build agents that learn and adapt in open, collaborative environments.
LLM agents controlling real-world tools are alarmingly easy to manipulate, with an 85% success rate for privilege escalation attacks, despite exhibiting basic security awareness.
Model safety isn't about whether adversarial content is seen, but whether it spreads: Claude strips injections at write_memory, while GPT-4o-mini propagates them flawlessly.
Forget hand-coding adapters: this middleware uses LLMs to automatically bridge REST APIs, GraphQL endpoints, and IoT devices with a 90% success rate.
Stop treating software requirements as independent entities: modeling their interconnectedness via user feedback boosts prioritization performance.
LLM API calls are breaking your program analysis tools, but this new taxonomy of information flow across the NL/PL boundary offers a way to fix them.
Smaller open-source models can outperform proprietary VLMs on misleading charts by strategically decoupling perception and verification within a specialized agentic workflow.
Learning interpretable safety rules from noisy, real-world data is now possible, outperforming purely neural or simpler neuro-symbolic approaches by a large margin.
Forget adversarial training: a closed-form solution can make multi-agent RL for drone collision avoidance surprisingly robust to GPS spoofing.
Stop wandering aimlessly: DRIVE-Nav's directional reasoning and inspection slashes path lengths in open-vocabulary navigation, achieving a 5.6% SPL boost on HM3D-OVON.
Scale expert know-how in tool-intensive industrial workflows with a voice-guided system that cuts process time and boosts repeatability.
Fine-tuning LLMs on air traffic control heuristics slashes near mid-air collisions, but only if you stick to supervised learning.
Can social robots nudge humans to cooperate more effectively in group settings?
Robots can now catch dynamically moving objects with human-level dexterity, thanks to a shared autonomy framework that intelligently blends teleoperation with learned diffusion policies.
Turns out, even with RL, herding fish is harder than it looks: guidance efficacy plummets as school size increases.
LLM-orchestrated multi-robot systems can overcome physical execution failures and achieve near-teleoperation performance by intelligently requesting human assistance only when needed.
Implicit control, where assistive robots adapt to user cues instead of direct commands, can actually *increase* a user's sense of control and reduce workload.
Heterogeneous uncrewed vehicle swarms aren't just a collection of different robots; they're a fundamentally more resilient architecture, provided you navigate the complexities of sim-to-real transfer and standardized evaluation.
Algorithmic expertise can now be explicitly represented, learned, and transferred as executable knowledge graphs, unlocking zero-shot generalization across domains.
Software engineers in regulated industries will only adopt sustainable coding tools that fit seamlessly into their existing workflows, require minimal data access, and provide actionable insights.
The lack of comprehensive benchmarks for AI blue teams leaves SOCs vulnerable, and this paper lays the groundwork for rectifying that gap.
Forget trajectory-level rollouts: MuSEAgent learns faster and reasons better by distilling past interactions into reusable, state-aware decision experiences.
LLMs can learn reusable code optimization skills from slow/fast program pairs, enabling significant efficiency improvements without runtime feedback.
Web agents can achieve 3x faster search and higher final accuracy by dynamically adapting their context management strategy based on the current state, rather than sticking to a single fixed approach.
Constraining LLMs' vocabulary ("No-Have" or "E-Prime") can boost ethical reasoning by 19%, and ensembles of these constrained agents can solve debugging problems that standard models miss.
Current anonymization methods either over-process images or miss subtle identifiers, but this new agentic framework nails context-aware PII segmentation with diffusion, slashing Re-ID risk by 73% while preserving image quality.