Search papers, labs, and topics across Lattice.
We track OpenAI, DeepMind, Anthropic, and 17 other labs daily - with AI-powered summaries, trend charts, and a weekly digest.
We read everything so you don't have to. One email, zero noise.
MaxProof's innovative test-time scaling enables an AI to outperform human champions in mathematical proof competitions.
Humanoid robots can learn to distinguish themselves from others purely through proprioceptive-visual cues, enabling advanced social interactions without predefined identities.
Hallucinations in LLM-generated news summaries can be effectively mitigated using a novel Chain-of-Thought reasoning framework, leading to more accurate and reliable timeline summarization.
Operads could revolutionize how we understand and improve multi-step reasoning in LLMs by providing a robust mathematical framework that enhances answer consistency.
Reducing inter-utterance silence from 9.6 seconds to 0.3 seconds transforms the quality of real-time game commentary, making it feel more natural and engaging.
Surprisingly, the error in estimating counts for hierarchical prefixes remains constant regardless of hierarchy height or the number of heavy hitters.
L-VARC achieves superior visual reasoning performance with just 18 million parameters by cleverly integrating language guidance into the learning process.
CFALR outperforms traditional methods by seamlessly integrating collaborative filtering with large language models for personalized fashion recommendations.
FTP-1 not only excels on familiar tactile sensors but also achieves unprecedented success on unseen setups, redefining the potential for cross-sensor generalization in robotic manipulation.
Current vision-language models struggle with process understanding in robotic manipulation, but targeted post-training can yield significant improvements.
Steering embodiment-agnostic policies with joint-space guidance can slash collision rates by over 90% in real-world robotic tasks.
Molecular reference imbalance can introduce significant errors in quantum Monte Carlo adsorption energy calculations, but a new hybrid cycle effectively mitigates this issue.
We read everything so you don't have to. One email, zero noise.
Operadic consistency reveals a powerful new way to diagnose reasoning failures in LLMs, achieving correlations with accuracy that exceed traditional confidence metrics.
PERIA-8B not only surpasses state-of-the-art models in spatial reasoning but does so with a fraction of their size, revealing the power of tool-augmented approaches.
TetherCache slashes quality drift in long-form video generation from 7.84 to 1.33, ensuring stability and coherence over extended sequences.
SPARC reduces noisy labels by leveraging task structure, enabling robots to learn from more reliable demonstrations and outperforming traditional methods in real-world applications.
MSA slashes per-token attention compute by over 28x while maintaining competitive performance, revolutionizing how LLMs can handle ultra-long contexts.
Environment engineering, not just agent workflows, is the key to unlocking the full potential of autonomous scientific discovery, as demonstrated by EurekAgent's record-breaking results.
Joint image-depth generation can be achieved with a single model trained on sparse data, outperforming existing methods by a significant margin.
Active learning can cut the data needed for accurate dynamics discovery by focusing on the most informative regions, outperforming random sampling by a significant margin.
LLMs can lose over 30% of their accuracy when faced with misleading medical contexts, revealing a critical vulnerability in their decision-making processes.
Learning rules as visual-symbolic transitions rather than just language descriptions could revolutionize how we approach in-context reasoning in AI.
SKIM compresses procedural skills in LLMs by 30-60% without sacrificing performance, revolutionizing how we manage reusable natural language skills.
Simple prompting techniques can transform LLMs into more reliable mirrors of human judgment, recovering the full spectrum of responses.
We read everything so you don't have to. One email, zero noise.
Direct token-level self-distillation can backfire, but Sibling-Guided Credit Distillation redefines credit assignment to enhance long-horizon tool-use without amplifying harmful behaviors.
The journey from AGI to ASI could unfold through a series of incremental breakthroughs, challenging the notion of a single transformative event.
VIPIR achieves orders-of-magnitude higher throughput for private information retrieval while slashing communication and memory overheads, revolutionizing large-scale database privacy.
Retrieving the right prompts can boost LMM performance by up to 30%, challenging the assumption that similarity guarantees effectiveness in in-context learning.
High artifact detection rates in VLMs mask significant failures in contextual understanding, with top models misidentifying visual cues in over 46% of cases.
Achieving high-fidelity audio generation with just four sampling steps, AudioX-Turbo dramatically cuts inference costs while enhancing performance across multimodal tasks.
EgoEngine transforms human manipulation videos into actionable robot demonstrations, enabling zero-shot learning without real-world data.
Naively scaling test-time compute is wasteful; strategically allocating it with DIRECT can enhance embodied agent performance while slashing latency by up to 65%.
Decoupling modality processing in VLA models leads to a staggering 95.2% success rate in complex manipulation tasks, far surpassing traditional synchronous approaches.
Action-chunking policies can lead to premature robot assistance, but a novel steering method effectively mitigates this issue, enhancing collaboration efficiency.
State-of-the-art surgical robotics policies can be disrupted by adversarial attacks, leading to a staggering 61% drop in task success rates.
Real-time LLM-generated user personas can dramatically enhance viewer engagement by dynamically balancing existing interests with new content recommendations.
We read everything so you don't have to. One email, zero noise.
A million-scale dataset for identity-preserving video generation enables a new benchmark that outperforms existing models with minimal parameter overhead.
M* achieves up to 2.9x lower real-time factor and 2.7x higher throughput for text-to-speech tasks, revolutionizing how we serve complex multimodal AI models.
PI-Hunter uncovers hidden prompt injection vulnerabilities in LLM agents that traditional defenses miss, revealing a critical gap in current security practices.
MTP acceptance rates can be dramatically improved by addressing entropy fluctuations, leading to up to 1.8x faster RL training.
Arbor's innovative approach to autonomous research enables a cumulative learning process that outperforms existing models by over 2.5 times in real-world tasks.
Adapter design can make or break coding performance in OpenClaw-style agents, with a full adapter boosting success rates by over 50 percentage points.
Semantic progress in dialogue can be quantified effectively without relying on large models, achieving human-level agreement on information gain across turns.
Recursive composition of verifiable environments can boost reasoning performance in RL by up to 3.1 points while using only a fraction of the original environments.
ATLAS achieves a staggering 5-10x increase in sample efficiency for discovering interpretable behavioral models, revolutionizing experimental design in cognitive science.
Behavioral INR reveals that self-supervised learning can effectively disentangle complex, overlapping policies from unlabeled behavioral data, outperforming traditional methods in high-dimensional settings.
Shifting credit assignment to fine-grained decision points boosts agentic RL performance by nearly 4 points, challenging the conventional focus on tool-call boundaries.
CHOP achieves superior performance on OOD tasks by utilizing a frozen ICON, revealing that interpretability and generalization can coexist in operator learning.
We read everything so you don't have to. One email, zero noise.
The "curse of precision" reveals how reliance on AI-generated content can degrade model performance by homogenizing training data.
Privilege-induced style drift can undermine reasoning model performance, but RLCSD effectively redirects the learning signal to focus on what truly matters—task-relevant tokens.
Achieving zero constraint violations without sacrificing generative quality, PolyFlow redefines the landscape of safe flow-based modeling in critical applications.
GENIE reveals that traditional metrics fail to capture the nuanced dimensions of novelty, offering a sharper lens for evaluating LLM creativity.
ProPlay allows agents to rehearse future actions using a structured procedure graph, leading to substantial improvements in self-evolution and environment understanding.
Transitioning to post-quantum cryptography can be streamlined through a novel API design that decouples key management from specific algorithms, allowing for effortless updates.
NavWAM turns visual foresight into executable robot actions, outperforming traditional planning methods in real-world navigation scenarios.
Y-BotFrame transforms quadruped robots into intelligent assistants that can understand and execute natural language commands in real-time.
Conditioning robot policies on a spatiotemporal feature map enables faster and more robust long-horizon mobile manipulation, outperforming traditional image-only approaches.
Truncated positional encodings in GNNs can drastically change their expressive capabilities, with mixed encodings outperforming single families on real-world tasks.
A unified assessment framework reveals hidden insights about agent performance, transforming how we evaluate AI systems.
LLMs exhibit substantial performance gaps in supramolecular chemistry, revealing critical areas for improvement in host-guest reasoning tasks.
Object-centric mask conditioning in MaskWAM dramatically improves policy performance, outperforming traditional WAMs by effectively reducing language ambiguity in complex environments.
GF-DiT achieves up to 6.01× throughput improvement and 95% latency reduction by dynamically adapting GPU parallelism in response to workload demands.
We read everything so you don't have to. One email, zero noise.
Energy-efficient training of neural networks may soon be achievable through innovative optical methods, as shown by the successful implementation of Equilibrium Propagation in a Spatial Photonic Ising Machine.
Fixed memory weights in neural operators can hinder performance, but AMGFNO's dynamic gating adapts memory usage to drastically improve accuracy based on resolution.
Achieving a quantum-classical separation in measurement complexity could revolutionize how we approach chaotic systems in machine learning.
Despite the promise of generative adversarial networks, adding synthetic data failed to improve mound detection accuracy on Mars.
ReSET boosts reasoning accuracy in large models while slashing inference latency, achieving a remarkable 2.5× speedup in critical decoding tasks.
LLM-derived rankings can now achieve near-human accuracy with a fraction of the cost, thanks to a new method that quantifies and calibrates uncertainty in evaluations.
The WHAR state of the art reveals a surprising distribution of performance across architectures, with compact models outperforming larger ones in deployment efficiency.
Abrupt qualitative changes in generative model outputs can be traced to geometric features in the data landscape, revealing critical points that dictate model behavior.
Interactions among federated learning students can significantly boost performance, enabling noisier participants to learn effectively with fewer samples.
CausalMoE not only sets a new benchmark for Granger causal discovery but also excels in few-shot learning, revealing the power of heterogeneous expert routing in complex temporal analyses.
Reasoning-aware retrieval can boost language model performance by surfacing diverse solution strategies that traditional methods overlook.
Agents-K1 transforms how we extract and reason about scientific knowledge, achieving superior performance in multi-hop reasoning tasks compared to existing methods.
We read everything so you don't have to. One email, zero noise.
AgentRivet fills a critical gap in particle physics analysis by automatically generating Rivet routines, improving the accessibility of model-independent measurements.
Foundation models may excel at forecasting, but their accuracy doesn't always translate to better resource allocation decisions in cloud environments.
Context-aware ASR corrections can be dramatically improved by leveraging a dynamically structured ontology memory, leading to more accurate and relevant corrections in long conversations.
Current LLM-based web agents are vulnerable to prompt-injection attacks, with no reliable defenses against any attack objective, revealing a critical oversight in security evaluations.
Autoregressive policies can achieve real-time execution with superior performance and speed, challenging the dominance of diffusion-based approaches.
Masking compositional concepts in one modality while leveraging contextual cues from another can dramatically enhance the compositionality of vision-language models.
Achieving superior multimodal image synthesis, DDE-GAN combines spatial and frequency learning with geometric consistency to revolutionize CT-PET imaging.
ARMOR-MAD achieves up to 96.5% accuracy in multi-agent debate tasks by dynamically routing debate processes, showcasing the power of adaptive computation in large language models.
Engagement with mental health content on TikTok reveals a stark contrast between creators' negative sentiment and audiences' more positive responses, especially in suicide prevention discussions.
LLMs exhibit a surprising phase transition in error patterns as semantic complexity increases, challenging conventional approaches to stance detection.
ProReviewer outperforms larger models by up to 39% in peer review quality by enabling proactive investigation of research papers.
Automatically generated Multi-Agent Systems are not only outperformed by Single-Agent Systems but also exhibit architectural inefficiencies that challenge the very foundations of multi-agent design principles.
We read everything so you don't have to. One email, zero noise.
MDForge not only matches human expertise in molecular dynamics pipeline design but also uncovers a novel high-affinity binder, showcasing the potential of LLMs in scientific discovery.
Transforming failures into focused training tasks boosts tool-using language model performance by over 8% on key benchmarks.
Optimal granularity in RAG benchmarks varies by dimension, with question complexity thriving on fine distinctions while other factors favor medium granularity.
VLMs can miss critical context despite localized attention, but simply enlarging visual spans can dramatically boost comprehension accuracy.
Line-number extraction outperforms rewriting strategies, achieving up to 95% term recall while minimizing hallucinations in safety-critical applications.
Small LLMs can outperform larger models in biomedical claim verification, achieving significant gains at a fraction of the cost.
Edge-level methods uncover how irrelevant numerical anchors influence language model judgments, revealing shared pathways that shift with model tuning.
Shapley-guided analysis reveals hidden vulnerabilities in multi-agent systems, enabling targeted and coordinated adversarial attacks that traditional methods miss.
Semantic ACE matching retains identification accuracy in IoT devices even under challenging conditions, outperforming traditional methods when traffic patterns vary.
VLMs can outperform traditional embedding methods by serving as reliability-aware semantic auditors, boosting occupancy model accuracy for rare classes.
A novel control architecture reduces steady-state attitude error in underactuated spacecraft by integrating model predictive control with a physics-informed neural network and a Lyapunov safety layer.
Temporal conductance reveals that consensus in dynamic networks can be reached significantly faster than previously thought, challenging assumptions about static connectivity.
We read everything so you don't have to. One email, zero noise.
Foresight boosts skiplist throughput by up to 45%, revolutionizing cache efficiency in concurrent data structures.
Irregular membrane curvature in cancer cells boosts Piezo1 activity, enabling selective apoptosis under ultrasound, while healthy cells remain unaffected.