Search papers, labs, and topics across Lattice.
100 papers published across 7 labs.
Laplacian DP and adaptive quantization can slash federated learning communication costs by over 50% without sacrificing accuracy or privacy, even with non-IID data.
Compressing multi-dimensional human preferences into single binary labels severely degrades DPO training, but a semi-supervised approach can recover state-of-the-art performance without additional human annotation.
Guaranteeing physical constraints in your ML model doesn't have to sacrifice uncertainty quantification – this Bayesian method bakes in linear equalities while shrinking credible intervals.
Standard black-box optimization falls apart when deploying ML models under tight constraints in crash-prone environments; TBA offers a robust, feasible-first alternative that actually works.
Spark Policy Toolkit unlocks scalable policy learning in Spark by guaranteeing consistent results even with distributed execution, finally making it possible to apply complex policy learning techniques to large datasets.
Compressing multi-dimensional human preferences into single binary labels severely degrades DPO training, but a semi-supervised approach can recover state-of-the-art performance without additional human annotation.
Guaranteeing physical constraints in your ML model doesn't have to sacrifice uncertainty quantification – this Bayesian method bakes in linear equalities while shrinking credible intervals.
Standard black-box optimization falls apart when deploying ML models under tight constraints in crash-prone environments; TBA offers a robust, feasible-first alternative that actually works.
Spark Policy Toolkit unlocks scalable policy learning in Spark by guaranteeing consistent results even with distributed execution, finally making it possible to apply complex policy learning techniques to large datasets.
The secret to effectively pruning LLMs might not be *how* you search for redundant layers, but *what* you're optimizing for.
Achieve dynamic regret bounds for online regression in RKHS by combining discounted VAW with finite-dimensional subspace approximations, offering a practical approach for time-varying comparisons.
Edge devices can now achieve up to 494x faster certified robustness with Laplace-Bridged Smoothing, making formally verified AI deployments practical in resource-constrained settings.
Training LLMs to explicitly optimize for how they're *actually* used at inference time unlocks substantial performance gains compared to standard fine-tuning.
Forget perturbation theory: HAML meta-learns effective qubit Hamiltonians directly from multi-mode simulations, enabling accurate characterization even when traditional methods break down.
Learning generative models for high-dimensional matrices doesn't have to be a computational nightmare: CoreFlow achieves state-of-the-art results in low-data regimes by learning shared low-rank structure.
Forget training from scratch: HyLo lets you breathe new (long-context) life into your existing Transformer LLMs, achieving 32x context extension and 90% KV-cache reduction.
Multi-anchor word embeddings, previously impractical for LLMs, can now outperform standard embeddings with 98% fewer parameters and a 40x smaller embedding layer.
Forget hand-crafted examples: this system automatically generates worked examples tailored to student errors by mining common code patterns.
Split learning offers a surprisingly viable path to fine-tuning LLMs on sensitive data without breaking the bank or sacrificing privacy.
Catastrophic overfitting in fast adversarial training isn't just overfitting – it's a backdoor, and now we can use backdoor defenses to fix it.
Low-confidence training samples are secretly sabotaging your fast adversarial training, leading to catastrophic overfitting and a worse robustness-accuracy trade-off.
Zero-shot Sim2Real transfer for a humanoid ballbot is now possible thanks to a friction-aware RL framework and high-fidelity simulation that models omni-wheel mechanics.
Autonomous vehicles can drive more efficiently by using a new metric that links real-time acceleration decisions to overall travel time.
Solving massive optimization problems just got a whole lot faster: SDSL-Solver achieves up to 97x speedups over PARDISO by distributing sparse linear system solves across multiple nodes.
Sequence recommendation models can achieve near-perfect scaling efficiency in distributed training, slashing wasted GPU cycles by up to 90%.
Stop paying a 55% performance-per-dollar premium: KubePACS optimizes Kubernetes spot instance provisioning for cost, performance, and availability, blowing away existing solutions.
FlashOverlap shatters the tail latency bottleneck in distributed LLM training by orchestrating peer-to-peer communication with fine-grained computation overlap.
FPGA CAD tools waste enormous time re-checking the same cluster packings, but a simple memoization trick can slash runtime by up to 29x.
Computation can now design light-activated drugs: a novel compound achieved a 15x boost in cancer target inhibition upon green light exposure.
Ditch the slow sampling: DriftSE achieves state-of-the-art speech enhancement in a single step, outperforming diffusion models with a novel equilibrium-based approach.
Ditch the prompts: DiffuSAM adapts SAM2 for medical image segmentation by synthesizing mask embeddings with a diffusion model, achieving strong performance without fine-tuning or expert input.
Sub-linear attention is now possible without sacrificing complete long-range dependency retention, thanks to learnable summary tokens that compress context.
Storing user interaction histories in a normalized, immutable tier and reconstructing sequences just-in-time slashes data infrastructure costs and unlocks the potential of ultra-long sequence DLRMs.
Achieve real-time learning-based control of complex robotic systems by exploiting differential flatness for dramatic speedups in MPC computation.
Forget slow, multi-step action generation: CF-VLA's coarse-to-fine approach slashes latency by 75% while boosting real-robot success rates to a new high of 83%.
Achieve millisecond-level 3D point cloud reconstruction from a single image without sacrificing quality, blowing past diffusion model latency.
Vanilla on-policy distillation falls apart in multi-turn settings due to compounding errors, but a simple curriculum on trajectory length fixes it, even letting students beat their teachers.
LLMs can be systematically debugged and improved by treating training data as code, allowing for targeted "patches" that fix concept-level gaps and reasoning errors.
Shrinking massive audio foundation models by up to 61x is now possible without significant performance loss, thanks to a novel self-supervised distillation approach that works directly on embeddings.
Squeezing intermediate tensors with FP8 quantization and adaptive transforms can nearly double the throughput of tensor-parallel LLM training without sacrificing accuracy.
ELBO-based reinforcement learning, previously dismissed for visual generation, can actually outperform MDP-based methods for aligning denoising generative models with human preferences.
Laplacian DP and adaptive quantization can slash federated learning communication costs by over 50% without sacrificing accuracy or privacy, even with non-IID data.
Generative AI evaluation can be sped up by 8-65x without sacrificing accuracy by proactively focusing on the most informative test cases using a pre-trained Gaussian Process surrogate model.
FlowAnchor makes flow-based video editing robust to multi-object scenes and long sequences by stabilizing the editing signal, opening the door to more complex and controllable video manipulation.
The best continual learning method for your task might depend more on *how much* of the model you fine-tune than *which* regularization strategy you use.
Signal processing offers a surprisingly effective lens for understanding and improving LoRA, the reigning champ of parameter-efficient fine-tuning.
Forget painstakingly tuning RL algorithms for quantum circuit optimization – smart replay buffer engineering alone can slash training time by up to 90% and boost sample efficiency by 32x.
Ditch KL divergence for IPMs in Bayesian experimental design and watch your credible sets tighten and your designs stabilize, even when your model's a bit off.
Uncover hidden GFlowNet training dynamics with GFlowState, a visual analytics tool that reveals how these models explore the sample space and shift sampling probabilities.
Solve new PDEs 100x faster with 10x less error by learning a transferable PINN representation and adapting to new equations with a single closed-form calculation.
Forget philosophical debates: a practical "learning mechanics" is crystallizing to explain *how* deep learning works, not just *why* it should.
Ditching noisy SGD trajectories for smooth Bezier curves unlocks better dataset condensation, especially when data is scarce.
ML models can accurately predict quantum properties out-of-distribution, but still fail to accelerate SCF convergence – until now.
N-gram models can rival neural networks in event log prediction, but the secret sauce is a smart ensemble method that dynamically promotes the best model during inference.
Forget ReLU's rough edges: a new family of smooth activation functions, GEM, closes the gap with GELU and even outperforms it in some cases, revealing a surprising architecture-dependent sweet spot for smoothness.
Achieve state-of-the-art periodic signal denoising with a single, lightweight dilated CNN that generalizes across frequencies via resampling.
Controller design can be effectively framed as inference, enabling efficient trajectory and policy optimization via tempered sampling.
Integrating deep learning forecasting with MILP optimization slashes inventory costs by 5.4% and stockouts by 27.5% in textile and PPE supply chains.
RL policies don't have to be temporally incoherent messes: shaping action probabilities with dynamical priors unlocks structured, interpretable decision-making.
PINNs can now efficiently solve highly oscillatory wave equations in heterogeneous media, thanks to a Green's function-based integral formulation that cuts computation by 10x and avoids absorbing boundary layers.
Even when your variational approximation is wrong, symmetries in the target distribution can guarantee you still get the mean right.
Forget compressing entire tokens – selectively routing *parts* of tokens based on query relevance unlocks better compression-quality tradeoffs in LoRA-adapted transformers.
Stop punishing your model for disagreeing with corrupted data – Trust-SSL learns better representations by treating alignment with degraded views as a residual learning problem, not a hard constraint.
Forget cross-entropy: a differentiable MCC loss function can boost your classification accuracy by nearly 5% on F1 score and 8.5% on MCC.
Halving the parameter count of LLMs without sacrificing performance is now possible with Hyperloop Transformers, thanks to looped layers and hyper-connected residual streams.
Existing bounds on system identification are too pessimistic, but a new martingale-based analysis unlocks near-optimal finite-sample guarantees for parameter estimation in linear dynamical systems.
Vector-based fine-tuning just got an 8x speed boost, rivaling LoRA's performance with a fraction of the parameters, thanks to a clever gradient-informed initialization.
Simple neural networks can accurately emulate complex aerosol microphysics in climate models, but only with careful attention to scaling and training convergence.
Forget slow convergence and inaccessible Hessians: this new de-biased covariance estimator turbocharges SGD with faster, more accurate uncertainty estimates.
Forget repeatedly re-running inference on residual graphs: this GNN-guided Ford-Fulkerson algorithm learns edge importance probabilities to dramatically accelerate max-flow computation and image segmentation.
Recurrent Transformers let you trade model depth for width, slashing KV cache memory footprint and inference latency without sacrificing performance.
A transformer-based deep learning approach can not only drastically accelerate Unit Commitment problem-solving but also, surprisingly, find lower-cost operational schedules than traditional MILP solvers in certain instances.
FedLLMs, thought to be safer due to data localization, are shockingly vulnerable: a new attack achieves near 100% membership inference accuracy, even with differential privacy.
Achieve high-fidelity image enhancement on mobile devices even after quantization by training a model that anticipates and adapts to low-precision representations.
Forget scaling laws – AgenticQwen proves that clever training with dual data flywheels can enable small language models to rival giants in real-world agentic tasks.
Forget generating static shapes – Sculpt4D now lets you efficiently sculpt dynamic 4D objects with state-of-the-art temporal coherence.
Training a video reshooting model on internet-scale monocular videos is now possible, thanks to a clever self-supervision trick that generates multi-view training data from a single video.
Domain shifts and novel classes at test time can be tamed by nudging features back towards the source distribution, even for out-of-distribution examples.
Frozen vision foundation models can be surprisingly effective at improving out-of-domain object detection by stabilizing relational modeling and semantic-spatial alignment in the detector.
Achieve a 10x speedup in detecting tiny objects in massive satellite images without sacrificing accuracy, even on a single GPU.
Current 3D Gaussian Splatting methods are too unpredictable for real-world use, but YOGO makes them deterministic and production-ready.
Steal accuracy from dense models and stabilize MoE training with a simple teacher-guided routing scheme that combats gradient starvation.
LMMs can gain surprising robustness and visual understanding by learning to denoise corrupted visual tokens, even without extra inference overhead.
Edge devices can now learn continuously from visual data with 40x faster speed and 380x better energy efficiency, thanks to a novel FPGA accelerator design.
Unlock real-time, high-quality 3D scene reconstruction from unconstrained images with varying lighting, thanks to a feed-forward Gaussian Splatting model that learns appearance embeddings.
Data loading bottlenecks can strangle your GPU utilization down to 10%, but a few smart optimizations can unlock a 6x speedup.
SIMD parallelism can finally unlock substantial speedups in large-number arithmetic by rethinking algorithms around data-parallel operations, yielding up to 19.3% throughput gains in scientific computing.
A server-driven adaptive sampling approach slashes power consumption in wireless iBCIs by 40mW while *improving* decoding accuracy.
Forget text-only pre-training: training on music *first* can dramatically accelerate language learning in small language models.
By dynamically injecting frequency-aware n-gram features, X-GRAM achieves state-of-the-art accuracy with smaller embedding tables, offering a practical path to scaling memory-augmented architectures.
Achieve zero global downtime in large-scale pre-training, even with millions of simulated chip failures, by decoupling learners and asynchronously aggregating parameter updates.
A surprisingly simple tweak to Hartigan's k-means algorithm unlocks another 2-5% accuracy boost, especially when clustering high-dimensional data.
Spectral analysis of client feature representations can identify and relabel noisy data in federated learning, outperforming existing noise-tolerant loss and loss-dynamic approaches.
Current evaluation metrics for trajectory inference can mislead researchers, but functional KL divergence offers a clearer, more reliable comparison of methods in sparse data conditions.
Fixed-width attention spans can give you better grammar and human-like reading patterns, especially when you're short on training data.
Ditch the GNN training: this label propagation method matches or beats GNN accuracy while being far more computationally efficient, even on tricky heterophilous graphs.
Layer-selective rehearsal and rapid recovery strategies can boost model performance in federated learning by over 30% in real-world applications.
Get 82x faster Bayesian inference for equipment monitoring by replacing MCMC with neural nets trained on simulated data.
Forget fine-tuning behemoth LLMs for every new task – this paper shows how a tiny, nimble model generating smart supplements can unlock surprisingly strong agentic performance from frozen giants.
Unlock 10x faster simulation-based inference in hierarchical models by training on single-site simulations and assembling synthetic multi-site data.
Differentially private federated learning gets a boost: PINA achieves 2.9% higher accuracy than state-of-the-art methods by using a novel two-stage approach with privacy-preserving initialization and normality-driven aggregation.
Geometry-aware optimization can dramatically improve LLM alignment by ensuring fairer trade-offs among conflicting human values.
Bayesian mixture-of-experts models can achieve robust density and parameter estimation with adaptive expert selection, fundamentally reshaping our approach to complex probabilistic modeling.
Calibration can be effectively improved during training by focusing on curvature and margin dynamics, leading to better confidence estimates without sacrificing model performance.
VDC achieves high-dimensional density estimation with remarkable speed and accuracy, transforming the landscape of copula modeling.