Search papers, labs, and topics across Lattice.
96 papers published across 8 labs.
Achieve near-lossless 60% attention latency reduction in video editing by exploiting query sharpness to dynamically route attention.
Fine-tuning efficient few-step diffusion models no longer requires sacrificing their speed, thanks to a self-distillation approach that preserves inference capabilities.
Skip the sampling: accurately predict the behavior of wide, random MLPs with a fraction of the compute, especially when assessing rare, high-stakes outcomes.
GMD algorithms, previously seen as a novel generative framework, can be understood as directly targeting fixed points of Wasserstein Gradient Flows, offering a new perspective on their optimization process.
Modeling 10,000+ correlated outputs is now tractable: T-LVMOGP offers a scalable alternative to restrictive low-rank MOGPs by learning a flexible deep kernel in a shared embedding space.
Achieve near-lossless 60% attention latency reduction in video editing by exploiting query sharpness to dynamically route attention.
Fine-tuning efficient few-step diffusion models no longer requires sacrificing their speed, thanks to a self-distillation approach that preserves inference capabilities.
Skip the sampling: accurately predict the behavior of wide, random MLPs with a fraction of the compute, especially when assessing rare, high-stakes outcomes.
GMD algorithms, previously seen as a novel generative framework, can be understood as directly targeting fixed points of Wasserstein Gradient Flows, offering a new perspective on their optimization process.
Modeling 10,000+ correlated outputs is now tractable: T-LVMOGP offers a scalable alternative to restrictive low-rank MOGPs by learning a flexible deep kernel in a shared embedding space.
Distributional regret bounds, which quantify the probability of exceeding different regret levels, are now achievable with a UCBVI-style algorithm, confirming a long-standing conjecture for multi-armed bandits.
Maximizing reward entropy by targeting a 50% pass rate in binary-reward RL unlocks significant speedups and performance gains in agentic tasks.
Doubly sparse regression gets a boost: this method avoids predictor duplication, saving compute, by projecting directly onto the intersection of selected groups.
Training MoE models just got a whole lot faster: Piper achieves up to 3.5x higher MFU by intelligently scheduling pipeline parallelism and optimizing communication.
Training data order matters more than you think: reordering your data can significantly improve unsupervised domain adaptation by reducing variance in domain discrepancy estimates.
Long-context models face a provable "impossibility triangle": you can't have efficiency, compactness, and unbounded recall *at the same time*.
Stop training in isolation: LNTrust lets decentralized models learn *who* to trust during training, so they can collaborate effectively at deployment, boosting accuracy and cutting communication costs.
Stop wasting time and resources on massive localization datasets: this framework achieves highly accurate outdoor localization by adaptively switching between offline and online learning strategies based on data availability.
Decoupling radial and angular dynamics in vision-language model adaptation unlocks significant gains in few-shot performance, outperforming existing flow matching methods.
Self-distillation can be more effective than learning from an external teacher, but only if you optimize for preference gaps instead of blindly matching the teacher's output distribution.
Scale multi-agent RL diversity metrics to hundreds of agents without sacrificing accuracy: Graph-SND offers a drop-in replacement for quadratic SND calculations, achieving near-identical results with order-of-magnitude speedups.
LLMs can now generate high-performance CUDA attention kernels that outperform hand-optimized code, thanks to a novel lift-transfer-lower approach that leverages expert knowledge.
Forget fine-tuning: "skill neologisms"—new soft tokens—let you inject skills into LLMs without weight updates, composing them zero-shot for flexible knowledge expansion.
ReLU network constraints can flip the script on whether adaptive querying helps in-context learning.
Observed sample displacements can be integrated into optimal transport to carve expressways through the input space, leading to more reliable modeling of distribution shifts.
Geometric continuity in deep networks isn't just a byproduct of depth, but an actively sculpted property arising from the interplay of residual connections and symmetry-breaking activations.
Spending up to 25% of your black-box optimization budget on feature computation for per-instance algorithm selection can still pay off, but optimizing that budget is key to unlocking PIAS's full potential.
Batch normalization's power comes from reshaping the geometry of neural network decision boundaries on a per-batch basis, not just from optimization benefits.
LLMs can be efficiently post-trained by only updating half the parameters, slashing memory costs without sacrificing performance.
LLMs can now generate neural architectures with 75% less code and higher accuracy by learning to write code "diffs" instead of building from scratch.
Unstable BO leaderboard rankings? They're likely due to ignoring the budget ratio (B/|A|) and prior rank correlation, which this paper elegantly captures with the Portable Regime Score (PRS) to predict performance reversals.
Neural operators can stably and accurately correct the structured truncation errors of classical numerical solvers for dispersive PDEs, even with rough data.
Physics-informed neural operators can now learn continually without forgetting, thanks to a simple replay strategy that preserves past knowledge while rapidly adapting to new out-of-distribution data.
Federated learning struggles when data quality varies across clients, but FedQual solves this with a novel approach that calibrates low-quality clients while preserving high-quality autonomy.
Incomplete one-hot encoding during FMQA's initial training phase can be overcome with space-filling sampling methods, leading to improved optimization performance.
Approximate computing can break MoEs in unexpected ways, with dense networks sometimes proving more robust, but careful retraining can unlock surprising efficiency gains in specific architectures.
Unlock white-box inference for SOC-ICNNs by directly reading out geometric primitives like Hessians from the optimal dual variables, bypassing black-box differentiation.
Incentivizing honest participation in federated learning is now possible without ground truth labels, even when some participants are trying to game the system.
Suppressing weight outliers via a Hessian-informed additive transformation unlocks >40% perplexity reduction in 2-bit quantized LLMs compared to standard GPTQ.
A hybrid AI model can boost corn yield predictions by up to 7.2%, offering a promising path to accelerate climate-adapted crop development.
Aligning random seeds across rollout simulations can significantly boost the performance of simulation-based planning, even in complex environments like Ludo.
Ditch the attention: ConvRec proves convolutional networks can beat Transformers in sequential recommendation while slashing compute and memory costs.
MoEs, despite their scaling advantages, suffer from a surprising "spectral plasticity loss" in continual RL, but a simple Parseval penalty can recover performance.
Fine-tune optimizer precision block-by-block and slash memory use without sacrificing model quality.
Forget brittle imitation learning: Q2RL unlocks robust on-robot reinforcement learning by distilling a Q-function from Behavior Cloning and intelligently gating between imitation and RL based on Q-value estimates.
Forget hand-crafted reward functions: this RL framework lets a bicycle robot learn complex stunts from just a spatial guideline and a few key poses.
Turns out you only need to tweak a few key audio tokens to jailbreak audio language models, opening the door to faster, more targeted attacks.
Synthesizing high-resolution satellite imagery with geometric precision is now more efficient, thanks to a windowed cross-attention method that rivals existing techniques while better respecting geometric constraints.
Audio diffusion models can be trained more efficiently by dynamically adjusting optimization strategies based on the evolving balance between semantic acquisition and fine-detail refinement during training.
Forget full fine-tuning: QLoRA on 7B models can match the perplexity of fully fine-tuned smaller models for low-resource languages, while slashing the parameter count by 40x.
Forget backprop and memory lookups: FAAST lets you adapt models at test time with a single forward pass, matching fine-tuning accuracy with massive speed and memory gains.
LLMs can retain 10x more of their original capabilities after fine-tuning, simply by using a dynamically adjusted "anchor" to constrain distributional drift during training.
Choosing between secure multi-party computation (SMPC) and fully homomorphic encryption (FHE) for secure ML depends heavily on the model architecture: FHE excels at regressions and simple networks, while SMPC dominates for complex CNNs.
RFT's Achilles heel? This benchmark reveals how fragile reinforcement fine-tuning is, and introduces an automated system to catch and fix training failures before they tank your LLM.
Get high-fidelity tactile simulations with 65% speedup and 40% less memory by combining coarse physics with neural implicit reconstruction.
Forget dataset-specific hacks: CPCANet achieves SOTA domain generalization by explicitly learning a structured, domain-invariant subspace with a differentiable CPCA layer.
Forget PEFT and KD, reprogramming distillation offers a surprisingly effective and robust way to adapt large medical foundation models to diverse downstream tasks.
Autonomous driving gets a boost: CRAFT cleverly combines the best of both worlds – dense counterfactual supervision and grounded closed-loop feedback – to significantly improve driving policies.
Achieve real-time bipedal walking control by cleverly swapping high-fidelity for low-fidelity models in MPC, slashing computation without sacrificing stability.
Implicit time integration on GPUs gets a 3x speed boost thanks to a novel algebraic coarsening method that avoids costly explicit remeshing.
Unlock near-oracle speech enhancement performance from compact microphone arrays by virtually expanding their spatial coverage with a novel neural network.
Hand-eye calibration gets a 67% accuracy boost in high-uncertainty scenarios thanks to a new optimization framework that cleverly avoids explicit uncertainty modeling.
Make your prompts 5x more interpretable without hurting accuracy: IPL combines discrete token selection with continuous optimization, and it's plug-and-play with existing methods.
Exploiting temporal continuity and feature deviations in wearable sensor data lets you adapt activity recognition models on the fly, boosting accuracy while slashing compute costs.
Quickly sanitize your engagement recognition models after training: subject-level unlearning recovers ~90% of retraining benefits at 25% of the cost.
Forget full fine-tuning: LoRA lets you adapt Geospatial Foundation Models for wildfire mapping with comparable accuracy while only tweaking 1% of the parameters.
Forget ImageNet – pre-training with chaotic augmentations yields surprisingly robust texture features, outperforming SOTA methods across diverse texture datasets.
Get expert-level feedback on your performance, not just a score, thanks to a new approach that uses language generation for proficiency estimation.
Stop wasting compute on unreliable rollouts and easy frames: Stream-R1 adaptively focuses video diffusion distillation where it matters most, boosting quality without architectural changes or added inference cost.
Get 4x faster LLM inference with Budgeted LoRA, which smartly redistributes compute between dense and low-rank pathways during distillation, outperforming standard LoRA in both speed and function-style in-context learning.
Domain match and language relatedness trump joint vocabularies for effective knowledge transfer in multilingual NMT.
SATFormer shows that selectively gating access to early-layer representations boosts Transformer performance, especially in retrieval tasks, without sacrificing efficiency.
Forward-Forward learning can finally compete with backpropagation on complex image tasks, thanks to a novel covariance-aware goodness function that captures crucial second-order feature dependencies.
Forget iterative approximations – this work delivers globally optimal solutions for unbalanced optimal transport between Gaussians via a clever reduction to finite-dimensional optimization.
Differentiating through physical simulations just got a whole lot easier: Neural Control avoids unrolling iterative solvers by using an adjoint formulation, enabling memory-efficient gradient-based control.
AI training jobs can now shrug off network failures that used to halt progress, thanks to a new resilient networking stack deployed at OpenAI and Microsoft.
Forget simplistic roofline models: these analytical models nail GPU performance prediction on Blackwell and CDNA3 with under 1.5% error.
Standard federated learning deployments can catastrophically fail with just 5-second latency or 50% packet loss, revealing a fundamental mismatch between FL's communication patterns and default TCP configurations.
Analyzing exascale performance bottlenecks just got hundreds of times faster, thanks to a new GPU-accelerated framework that pinpoints congestion and predicts optimization opportunities in scientific workloads.
Make your ASR models 25% more accent-robust with this surprisingly simple contrastive loss trick.
Active learning guided by transition path sampling overcomes the limitations of machine-learned potentials in transition-state regions, enabling accurate and efficient simulation of rare events without prior mechanistic knowledge.
Pretrained MLIPs already encode sufficient information in their latent spaces to guide active learning, enabling efficient fine-tuning without uncertainty quantification.
Stem retrieval accuracy leaps forward by 70% thanks to a new architecture that finally respects the phase of music.
Guaranteeing stable beamforming in dynamic acoustic environments is now possible with a novel adaptive diagonal loading method that strictly bounds White Noise Gain.
Multimodal graph unlearning doesn't have to destroy utility: carefully protecting high-dimensional input projections during the unlearning process preserves performance while still enabling effective forgetting.
Achieve better image denoising without clean data or precise noise models by statistically refining existing denoisers.
Conformal prediction offers a surprisingly effective way to handle both modality imbalance and noisy corruption in multimodal learning by explicitly modeling predictive uncertainty during training.
Ditch the overly conservative error bounds: a new probabilistic approach to floating-point analysis delivers speed and precision by cleverly taming Taylor expansions.
Rust developers can slash the noise in static analysis alerts by over 50% using an RL agent that learns to suppress false positives, outperforming even LLM-based methods.
Multi-turn RL agents can learn far more effectively by explicitly monitoring and controlling uncertainty at both the token and turn levels, leading to more stable training and higher performance.
Quantum circuit optimization doesn't always improve distributed execution: sometimes, local optimization surprisingly beats global methods at minimizing communication costs.
Bayesian optimization can automatically tune Hyperledger Fabric configurations to achieve double-digit throughput improvements, but the impact of measurement noise on interpreting gains cannot be ignored.
FedPLT achieves full-model accuracy in federated learning while training up to 82% fewer parameters per client, slashing communication costs and enabling participation from resource-constrained devices.
FedQueue tackles the Achilles' heel of federated learning on HPC clusters - unpredictable queue delays - by explicitly modeling and mitigating their impact, leading to significant speedups.
Strong differential privacy can cause speech classifiers to collapse into near-useless single-class predictors, but a two-stage training process involving distillation can stabilize training.
Hierarchical power allocation in datacenters can achieve near-perfect satisfaction ratios, even with oversubscription, by using a novel three-phase QP/LP optimization policy.
Optimizing for runtime in multimodal training can be energy-inefficient, as data movement and overlap on Grace Hopper chips dominate energy consumption, not raw compute.
Attention bottlenecks in long-context decoding? SANTA slashes memory bandwidth demands by stochastically sampling value vectors, achieving 1.5x speedups without sacrificing accuracy.
Attention might just be a cleverly disguised MLP: this work shows you can ditch the quadratic complexity and still get Transformer-level performance by dynamically predicting parameters in standard network layers.
Decision trees and diffusion models are secretly doing the same thing: optimizing a shared objective called Global Trajectory Score Matching.
Jointly training the tokenizer and autoregressive model slashes ImageNet FID to 1.48, finally making end-to-end autoregressive image generation competitive.