Search papers, labs, and topics across Lattice.
We track OpenAI, DeepMind, Anthropic, and 17 other labs daily — with AI-powered summaries, trend charts, and a weekly digest.
We read everything so you don't have to. One email, zero noise.
Mixture-of-Experts models might be hiding more of their reasoning than we thought, thanks to a newly quantified "opaque serial depth" metric.
Transform unstructured audio-visual signals into machine-readable structured knowledge with the Logics-Parsing-Omni model, which enforces strict alignment between high-level semantics and low-level facts.
A 4B parameter model can now beat much larger models at social reasoning, thanks to a new RL framework that aligns model reasoning trajectories with human cognition.
LLMs still can't automate real-world threat research, struggling with accuracy and nuanced expertise in a new benchmark derived from a world-leading company's CTI workflow.
Rebuttals hold the key to actionable AI-generated peer reviews: RbtAct uses them to train LLMs to give feedback that authors actually use.
LLMs get *more* honest when they have time to reason, defying human tendencies and revealing surprising insights about their internal representational geometry.
Get 6x the RLHF alignment for your LLM with a new active learning pipeline that focuses on annotating the most informative response pairs.
Dataset condensation, previously limited to neural networks, can now democratize access to clinical data by enabling privacy-preserving training of classical models like decision trees and Cox regression.
Zero-shot robotic manipulation is now within reach: TiPToP matches a 350-hour fine-tuned model without *any* robot data.
K-means gets a 17.9x speed boost on modern GPUs thanks to a clever redesign that avoids memory bottlenecks and atomic write contention.
By strategically increasing hash collisions, Nemo slashes write amplification in flash caches for tiny objects, a persistent bottleneck even with advanced SSDs.
DendroNNs offer a 4x energy efficiency boost over existing neuromorphic hardware by mimicking dendritic computation and training via a gradient-free rewiring mechanism.
We read everything so you don't have to. One email, zero noise.
Beat the state-of-the-art in radio signal separation by 122x using a transformer trained on cross-entropy loss, and the same architecture could work for gravitational waves.
Unmasked policy gradient methods can inadvertently suppress valid actions in unvisited states, creating a hidden exploration bottleneck that masking neatly avoids.
An AI agent can triage remote patient monitoring data with higher sensitivity than individual clinicians, suggesting a path to scalable and cost-effective patient monitoring.
Forget manual labeling: influence functions can automatically surface high-quality robot demonstrations, boosting policy performance by intelligently curating training data.
Medical multi-agent systems can reason deeply, but fall apart when switching between medical specialties, highlighting a critical need for more robust architectures.
Current Large Audio Language Models (LALMs) struggle with basic audio understanding tasks like noise localization and cross-lingual speech, with some performing worse than random chance, despite excelling at speech recognition.
Current ML benchmarks may be ungameable in theory, as they can lack a stable equilibrium where developers are incentivized to improve true model quality rather than just leaderboard scores.
Ditch the anchors and NMS: AutoReg3D reimagines 3D object detection as a sequence generation problem, opening the door for language-model techniques in 3D perception.
By dynamically adjusting contrastive learning temperatures based on data density, MM-TS achieves state-of-the-art results on multimodal long-tail datasets.
By closing the loop with explicit planning and feedback, SPIRAL overcomes the temporal drift and weak semantic grounding plaguing one-shot video generation models.
Forget local semantic alignment: CAST unlocks temporally coherent video retrieval and generation by explicitly modeling visual state transitions.
Intrinsic reward signals in unsupervised RL for LLMs inevitably collapse due to sharpening of the model's prior, but external rewards grounded in computational asymmetries offer a path to sustained scaling.
We read everything so you don't have to. One email, zero noise.
Forget catastrophic forgetting: this function-preserving expansion method lets you fine-tune without sacrificing pre-trained knowledge, matching full fine-tuning performance at a fraction of the cost.
Even the best open-weight LLMs still fail on nearly two-thirds of questions requiring reasoning over scientific tables, highlighting a persistent "execution bottleneck" in translating strategy to action.
Scale qualitative analysis of educational discourse data without sacrificing rigor using a mixed-initiative system that orchestrates LLMs and human expertise.
LLM-powered diagnostic AI is ready for prime time: a real-world clinical trial shows it's safe, patients love it, and doctors find it useful.
Lockbox offers a practical blueprint for enterprises to adopt cloud-based AI processing on sensitive data without compromising security, by implementing a zero-trust architecture.
Foley-Flow achieves state-of-the-art video-to-audio generation by aligning audio-visual representations with masked modeling, enabling rhythmic synchronization that was previously lacking.
Tangible interaction with robots backfires for users with negative attitudes, who prefer a digitally mediated interface as a social buffer.
LLMs can generate better recommendations if they pause to verify their reasoning steps, rather than reasoning in one long chain.
Forget massive datasets – targeted training on a smaller, carefully curated dataset of challenging competitive programming problems yields 3x faster gains in code generation performance.
Text-to-image customization can now preserve the original model's behavior, thanks to a decoupled learning objective that balances new concepts with pre-existing capabilities.
Training trillion-parameter Mixture-of-Experts models just got a whole lot faster: Megatron Core now achieves >1 PFLOP/GPU on NVIDIA's latest hardware.
Most social media platforms govern AI-generated content by simply applying existing content moderation policies, leaving key issues like ownership and monetization largely unaddressed.
We read everything so you don't have to. One email, zero noise.
Ditch the slow sampling dance of diffusion models: Variational Flow Maps let you condition image generation in a single pass by learning the right initial noise.
Forget direct prompt editing: this agentic planning framework, powered by offline RL and synthetic data, masters complex image styling by breaking it down into interpretable tool sequences.
AI-generated videos can now respect physics, thanks to a framework that uses a physical simulator to guide diffusion models, resulting in more realistic and coherent motion.
Accelerate video generation by 45% without retraining, simply by pruning redundant latent patches and cleverly recovering attention scores.
LLMs writing long stories frequently contradict themselves on basic facts and timelines, especially in the middle of the narrative, highlighting a critical weakness in long-form generation.
You can accurately predict the NDCG of a 1B-parameter reranking model by only training models up to 400M parameters, unlocking massive compute savings.
Achieve stable and competitive quantization for multimodal LLMs by explicitly accounting for modality-specific characteristics and cross-modal computational differences.
A 4B parameter SLM can now rival frontier agent performance in complex tool-use environments, thanks to a novel reinforcement finetuning framework that teaches it how to strategically acquire context and execute actions.
Forget scaling compute – the future of AI hinges on a 1000x leap in energy efficiency via tight AI+Hardware co-design over the next decade.
Aura unlocks more accurate aviation time series forecasting by explicitly modeling how different types of external factors interact with temporal dynamics.
Forget unimodal tasks—UniM throws down the gauntlet for truly unified multimodal AI, demanding models juggle any combination of text, image, audio, video, code, documents, and 3D inputs and outputs in a single, interleaved stream.
Pre-normalization in Transformers is the culprit behind the mysterious link between massive activation outliers and attention sinks, but decoupling them reveals their distinct functions: global parameterization vs. local attention modulation.
We read everything so you don't have to. One email, zero noise.
Finally, a standardized benchmark to rigorously compare heterogeneous treatment effect estimation methods in survival analysis, revealing performance nuances across diverse datasets and assumption violations.
Unlock up to 59x cost reductions in optimization by pretraining ML surrogates with cheap, imperfect labels and then refining them with self-supervision.
Forget laboriously sifting through layers or datasets for PEFT: GAST co-optimizes both, adaptively picking the most impactful data for each layer based on gradient alignment.
Imagine writing a script that *is* the video editor: Doki lets you do just that, turning text into a powerful interface for generative video authoring.
By respecting the intrinsic geometry of the probability simplex, $\alpha$-GaBO significantly outperforms standard Bayesian optimization in tasks involving probabilities and mixtures.
Muon's "one-size-fits-all" spectral update is holding back your models: Mousse adapts to curvature and cuts training time by 12%.
Panoramic vision-language models can achieve a level of holistic scene understanding and robustness in adverse conditions that's impossible for traditional pinhole-based VLMs.
Reasoning in LLMs isn't just for complex tasks: it can unlock surprisingly better recall of simple facts, but beware – hallucinated reasoning steps can backfire and increase overall hallucination.
Achieve real-time (31 FPS) video generation with a 277x speedup by distilling autoregressive models, thanks to a novel "diagonal distillation" strategy that cleverly manages temporal context and noise prediction.
LLMs trained with reinforcement learning become overconfident in wrong answers due to a fundamental conflict between accuracy and calibration objectives, but this can be fixed by decoupling these objectives during training.
A 4B-parameter model, InternVL-U, punches above its weight, outperforming 14B-parameter models in multimodal generation and editing by using a novel data synthesis pipeline and architecture.
MLLMs can bomb at math when text is rendered as an image, but a clever self-distillation trick can boost accuracy from 30% to 92%.
Finally, a standardized benchmark to rigorously evaluate how well models generalize carbon flux predictions to geographically distinct ecosystems they've never seen before.
A hierarchical graph attention network beats traditional machine learning models by 21% in predicting spectrum demand, offering a more reliable approach to spectrum management.
We read everything so you don't have to. One email, zero noise.
Physics-informed neural operators can drastically improve the accuracy and stability of phase-field modeling, outperforming standard neural operators in complex materials simulations.
Tighter privacy guarantees and higher utility in language models are simultaneously achievable via a principled parameter clipping strategy for Nonparametric Variational Differential Privacy.
Even a single error from a conditional independence oracle can prevent the unique identification of a Bayesian network structure, regardless of bounded graph parameters like treewidth.
ConvNets strike back: a ConvNeXt-based diffusion model matches Transformer performance at half the FLOPs and 7x faster training, all on just 4 GPUs.
Correcting systematic errors in aggregate data is now possible by using proxy variables to disentangle true signals from biases via a VAE-based framework.
GP Thompson Sampling's reliance on probability $\delta$ dooms it to polynomial regret, a stark contrast to GP-UCB's more favorable bounds.
Optimal transport provides a surprisingly tight and efficiently computable bound on transductive generalization in graph node classification, revealing how GNN depth impacts representation geometry.
Forget gradients: this new sampler learns complex distributions, even with discrete parameters, by enforcing time-reversibility and comparing forward and backward Markov trajectories.
Forget test-time training: this work bakes optimal control directly into LLMs, yielding up to 27.8% gains in mathematical reasoning.
LLM-powered VR guides for blind and low vision users are not just tools, but social actors, prompting users to give them nicknames and rationalize their mistakes when others are present.
Stop letting simulator errors in critical regions derail your policies: Sim2Act aligns surrogate fidelity with downstream decision impact, leading to more stable and robust decision-making.
Stop treating concept drift as one thing: DynaME's hybrid approach, separating recurring and emergent drifts, unlocks better online time series forecasting.
We read everything so you don't have to. One email, zero noise.
Accurately predicting spectrum demand across urban areas is now possible, with a model that captures 70% of the variability, paving the way for more efficient 6G network management.
Forget hand-crafted heuristics: this new dynamics-aware policy learns to exploit contact forces in cluttered environments, outperforming traditional methods by 25% in simulation and showing impressive sim-to-real transfer.
Simulation-based inference can improve neutrino interaction model tuning beyond traditional methods, even suggesting parameter values that better fit experimental data.
LLMs exhibit gender bias in healthcare scenarios by relying on stereotypes when reasoning about patient records, revealing the need to evaluate interactions among social determinants of health to assess LLM performance and bias.
RoadLogic automates the creation of diverse, realistic autonomous vehicle test scenarios from declarative specifications, sidestepping the manual effort of imperative approaches.
Even with ample tokens, "thinking" models don't always ace associative creativity, suggesting current prompting strategies only scratch the surface of unlocking LLMs' creative potential.
By explicitly optimizing for both reasoning structure and chemical consistency, Logos offers a pathway to reliable and interpretable AI systems for molecular science, outperforming larger models with a fraction of the parameters.
Retrieval-augmented agents get a serious reasoning boost by explicitly evaluating their own retrieval quality at each step, leading to state-of-the-art performance on multi-hop question answering.
SLLMs struggle with spoken prompts compared to text, especially in low-resource languages, highlighting a critical gap in current evaluation methodologies.
Stop wrestling with evaluation codebases: One-Eval automates LLM evaluation from natural language requests, handling benchmark selection, dataset normalization, and metric reporting with minimal user effort.
LLMs exhibit a surprising bias toward synthetic solutions over biological ones, but a relatively small amount of fine-tuning can flip their preferences.
Securing vulnerable cross-compartment interfaces may be possible with a new APR framework that bridges the compartmentalization awareness gap in existing LLMs.
We read everything so you don't have to. One email, zero noise.
Privacy-preserving LLM insight systems like Anthropic's Clio can be tricked into leaking a user's medical history with just a single symptom and basic demographics, even with layered heuristic defenses.
Meta Pixel's default settings lead to near-ubiquitous tracking of user activity and identity, even on health-related websites, while advertised tracking restrictions are easily bypassed.
ProvAgent slashes the cost of reconstructing near-complete attack processes to just $0.06 per day by replacing human analysts with a multi-agent system for threat investigation.
Reverse image search, a key tool for visual fact-checking, often amplifies misinformation and irrelevant content, burying debunking information.
Backdoor defenses focused on removing training triggers are fundamentally flawed, as alternative, perceptually distinct triggers can reliably activate the same backdoor via a latent feature-space direction.
Unlock scalable, privacy-preserving substring analysis with an algorithm that slashes space and time complexity while maintaining near-optimal accuracy.
LVLMs can be jailbroken by "Reasoning-Oriented Programming," which chains together harmless visual inputs to trigger harmful reasoning, much like return-oriented programming in traditional security exploits.
LLMs can now generate UML diagrams from requirements with human-level quality, potentially automating a resource-intensive phase in software design.
By fusing confidence-weighted point cloud projections with a Kalman-inspired update mechanism, ConfCtrl enables diffusion models to generate geometrically consistent novel views from sparse inputs, even under significant viewpoint shifts.
By converting point clouds into a format VLMs can understand, VLM-Loc significantly boosts text-to-point-cloud localization accuracy, outperforming prior methods that rely on shallower text-point cloud correspondences.
By translating visual observations into language, LAP achieves state-of-the-art procedure planning by disambiguating visually similar actions, outperforming vision-only methods.
Achieve robust surgical video question answering by injecting temporal awareness into parameter-efficient fine-tuning, outperforming standard PEFT methods on out-of-template questions.
We read everything so you don't have to. One email, zero noise.
Mimicking human eye movements with a Vision Transformer's attention maps yields a surprisingly effective and efficient image classification strategy.
Despite diverse formulations, ToF NLOS imaging methods hit similar performance walls in resolution and noise sensitivity when hardware is held constant, suggesting diminishing returns from algorithmic improvements alone.