Search papers, labs, and topics across Lattice.
LLM-based autonomous agents, tool-augmented language models, function calling, and agentic workflows.
#7 of 24
4
Forget static retrieval: FlowPIE's flow-guided literature exploration and evolutionary idea generation unlocks more novel, feasible, and diverse scientific ideas.
By tightly coupling reasoning, searching, and generation, Unify-Agent achieves state-of-the-art world-grounded image synthesis, rivaling closed-source models and opening new avenues for agent-based multimodal generation.
Forget tedious manual editing: CutClaw's multi-agent system can automatically transform hours of raw footage into engaging, rhythm-aligned short videos.
LLM agents can be made more efficient and effective by mathematically grounding their reasoning in physics, leading to better performance in time-sensitive and resource-constrained environments.
Robots get a 33% speed boost and become significantly more adaptable when you let LLMs handle the reasoning and RL handle the movements.
Current benchmarks mislead on AI agent security; robust defenses against indirect prompt injection require dynamic replanning, constrained LLM usage, and human oversight.
LLM-derived abstractions significantly boost analogical reasoning in narratives, outperforming end-to-end LLMs and revealing the critical role of appropriate abstraction levels.
Stop rewarding all LLM-generated candidates equally: ShapE-GRPO uses Shapley values to fairly distribute credit within sets, leading to better training and faster convergence.
Automating scientific discovery is now more accessible: Owl-AuraID navigates proprietary GUIs to control diverse precision instruments, freeing researchers from tedious manual operation.
Safely study LLM-driven social behavior at scale, without the ethical minefield of deploying agents on live social networks.
Achieve near-perfect success (98%+) in real-time causal diagnostics for smart manufacturing with a neurosymbolic multi-agent copilot, proving the viability of interpretable AI in complex industrial settings.
Automated medical coding finally gets explainable: Symphony's agentic approach provides span-level evidence, linking each predicted code to the supporting text.
Stop grepping your agent logs: a compiler that understands the deep structure of agent conversations unlocks better context learning and cuts token costs by up to 66%.
LLMs can steer narrative extraction to align with user-specified perspectives, achieving a 9.9% improvement in agenda alignment over keyword matching without sacrificing narrative coherence.
An 8B open-source model, trained with a new closed-loop environment for 6G network management, achieves performance comparable to GPT-4, suggesting a viable path to autonomous network control.
Multi-agent systems for automated research face a fundamental trade-off: parallel exploration offers speed and stability, while expert teams unlock deeper reasoning at the cost of increased fragility.
AI can now design better AI: ASI-Evolve discovers SOTA architectures, datasets, and RL algorithms, outperforming human-designed baselines by significant margins.
Stop cobbling together memory-augmented agents: MemFactory offers a unified "Lego-like" framework that streamlines training and boosts performance by up to 14.8%.
AI agents are far better at automating data engineering tasks than previously thought, but flawed benchmarks are obscuring their true potential.
Forget prompt engineering – Nomad autonomously uncovers insights you didn't even know to ask for.
Today's best smartphone GUI agents stumble when faced with the messy reality of personalized user workflows, achieving only limited success on a new benchmark designed to mimic real-world use.
NeuralUCB can slash LLM inference costs while maintaining quality, offering a practical alternative to always using the biggest, most expensive models.
LLMs are surprisingly bad at strategic communication, leaking sensitive information even when trying to be secretive.
Current evaluation methods miss 8-17% of agentic workflow failures because they only check final outcomes, overlooking cases where agents bypass policy checks but still reach the right answer.
An RL-aligned LLM can outperform expert toxicologists in identifying ingested substances from heterogeneous clinical data, suggesting a path to AI-assisted decision-making in high-stakes medical environments.
LLMs can classify dialects with surprising accuracy when given linguistic hints, suggesting a new way to leverage their knowledge for low-resource language tasks.
Forget clunky prompt engineering: distilling user history into a learned preference memory boosts LLM-based product reranking by over 10%.
LLMs can boost their task-solving accuracy by nearly 50% simply by remembering and re-using past procedural plans, even across tasks with no lexical overlap.
Forget killer robots: GenAI's impact on cybercrime is currently more "vibe coding" than world-ending, mainly assisting skilled actors in existing scams rather than unleashing a wave of autonomous cyberattacks.
Forget resource-intensive workshops – AI can now simulate entire expert panels to generate and stress-test socio-technical scenarios, opening doors to rapid policy exploration.
Simply injecting GenAI into online learning discussions doesn't cut it; reciprocal exchange and human oversight are key to boosting social presence and higher-order cognition.
Forget full automation – the sweet spot for AI deployment is often partial automation, where humans and AI collaborate to minimize costs.
LLM agents actually perform *better* when you strip away the majority of the boilerplate in their skill descriptions, suggesting current context windows are overloaded with irrelevant information.
LLMs can now reproduce Android app bugs with 87% accuracy, thanks to pre-assessing the visual effects of UI actions.
LLM agents leapfrog traditional methods for identifying bug-introducing commits, boosting F1-score by 17 points by intelligently searching for patterns in code changes.
Stop optimizing LLM logs for human readability – runtime-guided, task-oriented logs dramatically improve downstream debugging performance.
ErgoAI reimagines logic programming for modern AI by seamlessly integrating structured knowledge with insights derived from vector embeddings and external data sources.
Even state-of-the-art VLMs exhibit systematic failures in reasoning about the physical feasibility of actions in 3D environments, despite high semantic confidence.
Dialogue agents can now remember what you told them six turns ago with 57% accuracy, thanks to a new memory architecture that selectively forgets less important details.
Semantic scene understanding can keep your robot from crashing when running LLMs on edge devices.
Forget brute-force coverage – this method learns from simulated expert guidance to prioritize semantically relevant areas, dramatically speeding up target search in unseen environments.
An AI agent can now autonomously design functional antibodies with nanomolar affinities from text prompts, achieving a 67% success rate in lab validation and accelerating expert workflows by 56x.
MLLMs struggle to plan coherent interleaved text-and-image generation, often missing opportunities for tool use, revealing a critical gap in their ability to unify factuality with creativity.
Giving VLMs access to basic image manipulation tools and a strategic routing system dramatically improves their ability to "see through" visual illusions, even generalizing to unseen illusion types.
LLMs can now automatically verify imperative code during generation, achieving state-of-the-art results on complex algorithms and opening the door to large-scale datasets of verified code.
Superintelligence will not just be regulated by law, but will actively use and shape it, forcing us to rethink legal theory's human-centric foundations.
Image generation takes a leap towards real-world knowledge by training an agent that actively searches for and integrates external information, substantially boosting performance on knowledge-intensive tasks.
Current vision-language benchmarks miss the mark: AMIGO reveals how hard it is for agents to ground visual information across multiple images and turns.
Overcome the curse of dimensionality in offline MARL by learning which agents' actions to replace, achieving state-of-the-art performance with dramatically reduced computation.
Forget hand-designed RL algorithms – LLMs can evolve competitive learners from scratch, even when forced to invent completely new update rules.