Search papers, labs, and topics across Lattice.
100 papers published across 6 labs.
By modeling policy gradients as Gaussian processes, this work dramatically reduces the sample complexity in reinforcement learning, offering faster convergence and uncertainty estimates at little extra cost.
Distributed GPU training slashes the time needed to train deep learning models for CFD, making accurate fluid simulation predictions accessible in a fraction of the time.
Unlock face recognition with just one labeled example and a flood of unlabeled data, achieving state-of-the-art accuracy in a practical authentication scenario.
Get the best of both worlds: Linear-Core Surrogates offer the fast optimization of smooth losses and the statistical efficiency of margin-based losses, without sacrificing differentiability.
Q-learning can now tackle mean-field control problems with common noise, even when the ideal data is unobservable, opening the door to more realistic and complex multi-agent control scenarios.
Decision trees and diffusion models are secretly doing the same thing: optimizing a shared objective called Global Trajectory Score Matching.
Jointly training the tokenizer and autoregressive model slashes ImageNet FID to 1.48, finally making end-to-end autoregressive image generation competitive.
PINNs get a wavelet makeover, adaptively focusing on high-magnitude source regions and leaving vanilla methods in the dust on PDEs with extreme loss imbalances.
Signal processing practitioners gain a coherent roadmap for deploying sequential Gaussian Processes in real-world systems, bridging the gap between ML advances and practical application.
A single neural net can now solve 24 different multi-depot vehicle routing problems, thanks to a clever modulation technique that adapts to varying constraints.
Shuffling data introduces a fundamental shift in the privacy-utility tradeoff for mean estimation, rendering locally differentially private (LDP) mechanisms suboptimal.
Stop blurring the details: structure-aware Gaussian Splatting densification uses frequency analysis to resolve high-frequency textures faster and with higher quality.
Kernel smoothing, a classic technique from nonparametric statistics, can make reinforcement learning with LLMs more sample efficient.
Ditch the Transformers: a cleverly designed all-MLP architecture, ITS-Mina, rivals state-of-the-art time series forecasting while slashing computational costs.
By combining Newton's method with adaptive gradient descent, this attractor FCM sidesteps premature convergence, offering a more robust approach to learning in complex cognitive maps.
Achieve perfect train-test error tracking with a new training algorithm, Decoupled Descent, that eliminates the need for validation sets in certain stylized settings.
By intelligently perturbing class prototypes based on their discriminative power, VPDR achieves a superior privacy-utility trade-off in federated learning compared to naive Gaussian noise.
You can accurately predict steel hardness from nanoindentation data with a tiny dataset and some clever physics-based data augmentation, even when traditional methods fail.
Transformers, typically considered inefficient for spin system sampling, can now outperform CNN-based samplers by generating groups of spins, unlocking larger system sizes and higher effective sample sizes.
By modeling policy gradients as Gaussian processes, this work dramatically reduces the sample complexity in reinforcement learning, offering faster convergence and uncertainty estimates at little extra cost.
Foundation model embeddings reveal hidden structure in federated datasets, enabling surprisingly effective client clustering without any training or communication overhead.
Adversarial perturbations in LLMs have an exploitable low-rank structure, enabling more efficient and effective black-box attacks.
Hyperbolic embeddings and denoising diffusion can significantly boost few-shot learning on graphs, outperforming existing Euclidean-based methods.
Optimizing against the worst-case *sampler*, not just the nominal distribution, yields more stable decisions and better generalization in stochastic optimization with generative models.
CNNs are surprisingly fragile to even single-pixel shifts, but strategically placed global average pooling can fix this with a 98% parameter reduction and no accuracy loss.
Current open-world semi-supervised learning methods fall short in practical applications because they fail to extract latent semantic information, but SECOS overcomes this by directly predicting textual labels from a candidate set, achieving state-of-the-art results.
Stop those blurry edges: Softmax-GS uses learnable competition between Gaussians to sharpen 3D Gaussian Splatting, achieving state-of-the-art performance in novel view synthesis.
Cerebras CS-3 can deliver 100x speedups over CPU for sparse matrix multiplication at 90% sparsity, but surprisingly, becomes *slower* than CPU beyond 99% sparsity.
Distributed GPU training slashes the time needed to train deep learning models for CFD, making accurate fluid simulation predictions accessible in a fraction of the time.
VLMs can get a boost in long-tail performance and train more efficiently by dynamically upsampling underrepresented data clusters each epoch.
Forget per-scene optimization: GenWildSplat achieves state-of-the-art 3D reconstruction from sparse, unposed images in real-time using a purely feed-forward approach.
Red-teaming long-context LLMs just got a whole lot cheaper: FlashRT slashes the compute and memory costs of prompt injection attacks by up to 7x.
Fréchet Distance, previously deemed impractical for training, unlocks surprisingly high-fidelity image generation when optimized in representation space with decoupled batch sizes.
Stop letting SFT ruin your LMMs: PRISM uses on-policy distillation to realign your model *before* RL, boosting performance by up to 6%.
Unlock face recognition with just one labeled example and a flood of unlabeled data, achieving state-of-the-art accuracy in a practical authentication scenario.
Forget storing full task-specific models – Auto-FlexSwitch compresses the knowledge into tiny, dynamically assembled task vectors, slashing storage costs without sacrificing accuracy.
By fusing Bayesian neural networks with Kalman filtering, this work achieves more accurate and robust UAV state estimation than traditional methods in noisy, sparse sensing environments.
Cut your LLM fine-tuning costs by 30% without sacrificing accuracy by intelligently sampling training data based on cost.
Gradient cancellation during fine-tuning can be tamed by simply scaling down the gradients of correctly classified examples, leading to more stable and accurate models.
A single KL identity unlocks a surprisingly simple and unified derivation of core results for exponential families, streamlining the theoretical foundations of variational inference, entropy-regularized RL, and RLHF.
Combining diverse AI prediction tools as a Mixture of Experts slashes variance in semi-supervised inference, outperforming standard Prediction-Powered Inference.
Tabular foundation models can dramatically accelerate robot policy learning by enabling efficient global exploration within dynamically constructed policy subspaces.
Modular architectures in continual learning only matter when representational dimensionality is low, revealing that dimensionality acts as a key control knob for the benefits of structural separation.
Forget learning to answer – ANCORA shows language models can master verifiable reasoning by learning to *question* themselves.
Ditch the encoder-decoder: LPWTNet's closed-form Laplacian pyramid decomposition offers efficient inference for statistical channel fingerprint construction in massive MIMO systems.
Self-supervised encoders implicitly perform soft clustering on a "predictive manifold" in probability space, and this geometric perspective yields a practical Gaussian regularizer (SIGReg) competitive with variational IB.
Get the best of both worlds: Linear-Core Surrogates offer the fast optimization of smooth losses and the statistical efficiency of margin-based losses, without sacrificing differentiability.
Expert imbalance can cripple learning-to-defer systems, but a novel cost-sensitive margin-based loss function can restore performance.
Domain-adapting LLMs for EDA requires explicit RAG scenario training to prevent performance degradation, and QA augmentation during corpus construction further boosts performance.
Q-learning can now tackle mean-field control problems with common noise, even when the ideal data is unobservable, opening the door to more realistic and complex multi-agent control scenarios.
Forget scaling up data volume: repeating a smaller, high-quality German dataset yields superior language models compared to single-pass training on a larger, less filtered corpus.
Get 4x-10x smaller LoRA models for free with a simple post-processing step that doesn't hurt performance.
Discovering new molecules and materials just got 10x cheaper, thanks to a hybrid AI method that blends generative models with physics-based search.
LLM training bottlenecks? ZipCCL achieves up to 1.18x end-to-end speedups by losslessly compressing communication collectives, without sacrificing model quality.
LLMs can edit code 30% faster and cheaper without sacrificing accuracy, simply by learning to choose between generating full code and structure-aware diffs.
Lattice reduction, long a dark art, can now be understood as minimizing variance in a Gram-Schmidt profile, leading to new, efficient heuristics.
Guaranteeing topological consistency in image segmentation is now possible within deep learning frameworks thanks to a novel differentiable simple point computation method applicable to continuous-valued images.
Seemingly innocuous augmentations like blur can cripple self-supervised learning for fine-grained tasks like plant identification, but domain-aware choices unlock surprisingly strong performance.
NeRFs get a boost in video reconstruction quality by explicitly modeling inter- and intra-ray similarities with a novel transformer architecture.
Initializing prompts in flatter regions of the loss landscape dramatically improves calibration and performance in test-time prompt tuning for vision-language models.
Stop wasting compute pre-training on domain-specific datasets; this simple strategy lets you pre-train on ImageNet and still achieve state-of-the-art results on diverse remote sensing segmentation tasks.
Ditch the clunky external tools: VeraRetouch slashes model size and unlocks end-to-end training for photo retouching with a fully differentiable architecture.
Continuous-depth transformers, augmented with physics-informed loss, can significantly improve short-term weather forecasting, suggesting a promising path for hybrid physics-aware AI models.
Training LLMs on ultra-long contexts just got a whole lot easier: AutoSP automates sequence parallelism and activation checkpointing, boosting context length by up to 2.7x with negligible throughput cost.
Fixing your parallelism strategy while tuning batch size (or vice versa) leaves performance on the table: COPUS adaptively co-tunes both for faster LLM training.
Automating CUTLASS kernel synthesis and auto-tuning lets you get 2.79x speedups on real models like MiniGPT just by having an LLM rewrite your PyTorch.
Asynchronous RL for LLMs doesn't have to sacrifice convergence for speed: DORA achieves 2-4x faster training by cleverly managing multiple policy versions during rollout.
Training complex multi-agent RL policies just got 3,500x faster thanks to a new engine that optimizes for memory access and data locality.
Skip the SCF convergence grind: a physically-constrained equivariant neural net slashes the number of iterations needed by up to 81% while also predicting accurate molecular properties in a single shot.
Transfer learning can unlock scalable emission control across diverse waste incineration plants by learning transferable system-level structures that capture physical constraints, operating-regime heterogeneity, and carbon-pollutant coupling.
Forget grid search: LLMs can rapidly find energy-efficient inference parameters, outperforming traditional optimization methods with just a few human-guided prompts.
Quantum circuits can boost classical U-Net performance in remote sensing image segmentation, even with shallow, parameter-efficient designs.
Overlapping validation and private-data acquisition of successive blocks with state-consistency checks and ledger updates can almost double Hyperledger Fabric's commit throughput.
Shrinking diffusion LLMs by distilling across different architectures can yield surprisingly strong performance, even boosting code generation scores by 16 points on HumanEval.
By co-evolving experts through bidirectional policy distillation, CoPD achieves all-in-one integration of text, image, and video reasoning, outperforming domain-specific experts and suggesting a new training paradigm.
Ditch softmax attention for sigmoid: it unlocks 25% better cell-type separation, 10% faster training, and rock-solid stability for biological foundation models.
Frontier models are wasted on routine GUI tasks: a step-level cascade that adaptively invokes stronger models only when lightweight monitors detect progress stalls or semantic drift slashes compute costs without sacrificing performance.
Fine-tune massive LLMs like Qwen3-235B with 31K context on a single 8x RTX 4090 server, thanks to a novel pipeline schedule that eliminates the weight binding bottleneck.
Speculative decoding, typically used post-RL, can be integrated directly into RL training loops to accelerate LLM rollout generation by up to 2.5x.
Rope-assisted climbing robots can now nimbly navigate complex vertical terrains thanks to a new bi-level optimization strategy that coordinates foothold selection and dynamic motion.
Rule-based high-level coaching can drastically improve the safety and sample efficiency of goal-conditioned RL agents in UAV missions, even without pretraining.
Quantization crushes large object detection models for edge deployment, but knowledge distillation can resurrect them, even surpassing their original floating-point precision in a much smaller package.
Unlock 3x higher throughput in your data center by easily converting MPI applications to malleable jobs with a new library.
Training a 1024-node SOM on a billion-sample dataset in just over 6 minutes shatters previous scalability limits, thanks to a novel framework that leverages multi-GPU execution, out-of-memory streaming, and flexible topologies.
Squeeze more out of your hardware: TSP lets you shard both weights and activations across the same devices, unlocking memory savings for long-context training and inference.
Fine-tuning LLMs in federated settings just got easier: SplitFT lets clients adapt their cut layers and LoRA ranks, boosting performance and slashing communication costs.
Naive RAPL-based energy monitoring can add nearly 50% overhead to your measurements, but optimized tools can keep it negligible.
Curriculum learning flips the script on what language structures LMs find "easy," suggesting that training order is a critical factor in shaping their inductive biases.
Subword tokenization's secret sauce isn't just vocabulary size – it's the boosted training throughput and the subtle linguistic priors baked into subword boundaries.
Dynamic quantization, a widely adopted optimization for efficient ML serving, can leak your data to adversaries sharing the same batch.
SMBs drowning in security logs can now achieve enterprise-grade threat detection with a lightweight, open-source framework fine-tuned on a tiny LLM.
Genetic programming can automatically discover lightweight, generalizable feature extractors for time series classification that outperform standard methods.
Stuck training your reasoning model with RLVR due to a low initial success rate? This paper shows how a Tsallis q-logarithm loss can jumpstart learning by adaptively amplifying gradients, achieving a +14.4 point boost over GRPO on HotPotQA.
Imperfect rewards can actually *help* policy gradient methods escape local optima, challenging the conventional wisdom that reward accuracy is always paramount.
Forget replay buffers: TSN-Affinity shows that similarity-guided parameter reuse in TinySubNetworks can achieve strong performance in continual offline RL.
Fine-tuning language models with a graph-guided loss that captures global semantic relationships can significantly boost classification accuracy and convergence speed.
Ditch deflation: A new sparse discriminant analysis method sidesteps error propagation and achieves state-of-the-art accuracy by estimating all discriminant vectors simultaneously.
Teacher forcing, while effective for training RNNs on chaotic systems, fundamentally mismatches the optimization geometry of the true marginal likelihood, potentially harming the learned dynamics.
Slash your LLM's carbon footprint by up to 81% without sacrificing performance using a compression pipeline inspired by carbon taxation.
Augmenting few-shot knowledge distillation with adaptively selected, teacher-confident GAN-generated images dramatically boosts student accuracy.
Federated learning can achieve better accuracy-efficiency trade-offs under heterogeneous data by optimizing within a low-dimensional subspace and using a backfill-style update to retain residual components.
Even when you think you're only teaching a model what *not* to do, sustained gradient alignment can lead to the unintended acquisition of undesirable traits.
Differentiable physics unlocks adaptable and scalable phase retrieval for coherent transition radiation spectroscopy, outperforming traditional methods by seamlessly incorporating complex experimental effects.
Patchwork learning gets a boost: GraphPL uses GNNs to flexibly integrate all observed modalities, achieving SOTA imputation performance even with noisy inputs.
Skip the retraining: AM-SGHMC lets you apply a single trained MCMC sampler to various Bayesian updating problems for similar structures.