Search papers, labs, and topics across Lattice.
12 papers from NVIDIA Research on Training Efficiency & Optimization
Hierarchical power allocation in datacenters can achieve near-perfect satisfaction ratios, even with oversubscription, by using a novel three-phase QP/LP optimization policy.
Sonata outperforms traditional models in clinical kinematic assessments, achieving better fall-risk predictions with a fraction of the parameters.
Even moderate GPU fault rates can catastrophically derail LLM training, depending on the specific hardware datapath and numerical precision format.
Training autonomous vehicles can be dramatically sped up: MOSAIC achieves state-of-the-art driving performance with 80% less data by intelligently selecting training examples based on scaling laws.
Forget complex per-sample loss calculations – this simple three-line code injection uses batch loss smoothing to prune 20-50% of training data without sacrificing performance.
Now you can predict the structure of biomolecular assemblies exceeding 30,000 residues, thanks to a new context parallelism framework that shatters previous memory constraints.
Training trillion-parameter Mixture-of-Experts models just got a whole lot faster: Megatron Core now achieves >1 PFLOP/GPU on NVIDIA's latest hardware.
Injecting curvature information into MLIP training via Hessian-vector products achieves the accuracy of full-Hessian training with >24x speedups, opening the door to more efficient and accurate potential energy surface learning.
Forget monolithic LoRAs: LoRWeB dynamically mixes a basis set of LoRAs to unlock SOTA generalization in visual analogy tasks.
Uniform-state diffusion models, often overlooked in favor of masked diffusion, surprisingly outperform autoregressive and masked diffusion models on GSM8K when scaled to 1.7B parameters, despite worse perplexity.
Achieve state-of-the-art depth completion by adapting 3D foundation models at test time with minimal parameter updates, outperforming task-specific encoders that often overfit.
Smaller reasoning models can achieve both higher accuracy and shorter reasoning chains by adaptively penalizing unnecessary reflections and coordinating length penalties with problem complexity.