Search papers, labs, and topics across Lattice.
Efficient training methods, optimizer design, learning rate schedules, mixed precision, and gradient techniques.
#13 of 24
5
Achieve near-lossless 60% attention latency reduction in video editing by exploiting query sharpness to dynamically route attention.
Fine-tuning efficient few-step diffusion models no longer requires sacrificing their speed, thanks to a self-distillation approach that preserves inference capabilities.
Skip the sampling: accurately predict the behavior of wide, random MLPs with a fraction of the compute, especially when assessing rare, high-stakes outcomes.
GMD algorithms, previously seen as a novel generative framework, can be understood as directly targeting fixed points of Wasserstein Gradient Flows, offering a new perspective on their optimization process.
Modeling 10,000+ correlated outputs is now tractable: T-LVMOGP offers a scalable alternative to restrictive low-rank MOGPs by learning a flexible deep kernel in a shared embedding space.
Distributional regret bounds, which quantify the probability of exceeding different regret levels, are now achievable with a UCBVI-style algorithm, confirming a long-standing conjecture for multi-armed bandits.
Maximizing reward entropy by targeting a 50% pass rate in binary-reward RL unlocks significant speedups and performance gains in agentic tasks.
Doubly sparse regression gets a boost: this method avoids predictor duplication, saving compute, by projecting directly onto the intersection of selected groups.
Training MoE models just got a whole lot faster: Piper achieves up to 3.5x higher MFU by intelligently scheduling pipeline parallelism and optimizing communication.
Training data order matters more than you think: reordering your data can significantly improve unsupervised domain adaptation by reducing variance in domain discrepancy estimates.
Long-context models face a provable "impossibility triangle": you can't have efficiency, compactness, and unbounded recall *at the same time*.
Stop training in isolation: LNTrust lets decentralized models learn *who* to trust during training, so they can collaborate effectively at deployment, boosting accuracy and cutting communication costs.
Stop wasting time and resources on massive localization datasets: this framework achieves highly accurate outdoor localization by adaptively switching between offline and online learning strategies based on data availability.
Decoupling radial and angular dynamics in vision-language model adaptation unlocks significant gains in few-shot performance, outperforming existing flow matching methods.
Self-distillation can be more effective than learning from an external teacher, but only if you optimize for preference gaps instead of blindly matching the teacher's output distribution.
Scale multi-agent RL diversity metrics to hundreds of agents without sacrificing accuracy: Graph-SND offers a drop-in replacement for quadratic SND calculations, achieving near-identical results with order-of-magnitude speedups.
LLMs can now generate high-performance CUDA attention kernels that outperform hand-optimized code, thanks to a novel lift-transfer-lower approach that leverages expert knowledge.
Forget fine-tuning: "skill neologisms"—new soft tokens—let you inject skills into LLMs without weight updates, composing them zero-shot for flexible knowledge expansion.
ReLU network constraints can flip the script on whether adaptive querying helps in-context learning.
Observed sample displacements can be integrated into optimal transport to carve expressways through the input space, leading to more reliable modeling of distribution shifts.
Geometric continuity in deep networks isn't just a byproduct of depth, but an actively sculpted property arising from the interplay of residual connections and symmetry-breaking activations.
Spending up to 25% of your black-box optimization budget on feature computation for per-instance algorithm selection can still pay off, but optimizing that budget is key to unlocking PIAS's full potential.
Batch normalization's power comes from reshaping the geometry of neural network decision boundaries on a per-batch basis, not just from optimization benefits.
LLMs can be efficiently post-trained by only updating half the parameters, slashing memory costs without sacrificing performance.
LLMs can now generate neural architectures with 75% less code and higher accuracy by learning to write code "diffs" instead of building from scratch.
Unstable BO leaderboard rankings? They're likely due to ignoring the budget ratio (B/|A|) and prior rank correlation, which this paper elegantly captures with the Portable Regime Score (PRS) to predict performance reversals.
Neural operators can stably and accurately correct the structured truncation errors of classical numerical solvers for dispersive PDEs, even with rough data.
Physics-informed neural operators can now learn continually without forgetting, thanks to a simple replay strategy that preserves past knowledge while rapidly adapting to new out-of-distribution data.
Federated learning struggles when data quality varies across clients, but FedQual solves this with a novel approach that calibrates low-quality clients while preserving high-quality autonomy.
Incomplete one-hot encoding during FMQA's initial training phase can be overcome with space-filling sampling methods, leading to improved optimization performance.
Approximate computing can break MoEs in unexpected ways, with dense networks sometimes proving more robust, but careful retraining can unlock surprising efficiency gains in specific architectures.
Unlock white-box inference for SOC-ICNNs by directly reading out geometric primitives like Hessians from the optimal dual variables, bypassing black-box differentiation.
Incentivizing honest participation in federated learning is now possible without ground truth labels, even when some participants are trying to game the system.
Suppressing weight outliers via a Hessian-informed additive transformation unlocks >40% perplexity reduction in 2-bit quantized LLMs compared to standard GPTQ.
A hybrid AI model can boost corn yield predictions by up to 7.2%, offering a promising path to accelerate climate-adapted crop development.
Aligning random seeds across rollout simulations can significantly boost the performance of simulation-based planning, even in complex environments like Ludo.
Ditch the attention: ConvRec proves convolutional networks can beat Transformers in sequential recommendation while slashing compute and memory costs.
MoEs, despite their scaling advantages, suffer from a surprising "spectral plasticity loss" in continual RL, but a simple Parseval penalty can recover performance.
Fine-tune optimizer precision block-by-block and slash memory use without sacrificing model quality.
Forget brittle imitation learning: Q2RL unlocks robust on-robot reinforcement learning by distilling a Q-function from Behavior Cloning and intelligently gating between imitation and RL based on Q-value estimates.
Forget hand-crafted reward functions: this RL framework lets a bicycle robot learn complex stunts from just a spatial guideline and a few key poses.
Turns out you only need to tweak a few key audio tokens to jailbreak audio language models, opening the door to faster, more targeted attacks.
Synthesizing high-resolution satellite imagery with geometric precision is now more efficient, thanks to a windowed cross-attention method that rivals existing techniques while better respecting geometric constraints.
Audio diffusion models can be trained more efficiently by dynamically adjusting optimization strategies based on the evolving balance between semantic acquisition and fine-detail refinement during training.
Forget full fine-tuning: QLoRA on 7B models can match the perplexity of fully fine-tuned smaller models for low-resource languages, while slashing the parameter count by 40x.
Forget backprop and memory lookups: FAAST lets you adapt models at test time with a single forward pass, matching fine-tuning accuracy with massive speed and memory gains.
LLMs can retain 10x more of their original capabilities after fine-tuning, simply by using a dynamically adjusted "anchor" to constrain distributional drift during training.
Choosing between secure multi-party computation (SMPC) and fully homomorphic encryption (FHE) for secure ML depends heavily on the model architecture: FHE excels at regressions and simple networks, while SMPC dominates for complex CNNs.
RFT's Achilles heel? This benchmark reveals how fragile reinforcement fine-tuning is, and introduces an automated system to catch and fix training failures before they tank your LLM.
Get high-fidelity tactile simulations with 65% speedup and 40% less memory by combining coarse physics with neural implicit reconstruction.