Search papers, labs, and topics across Lattice.
24 papers from NVIDIA Research on Training Efficiency & Optimization
Humanoid robots can now traverse complex terrains with human-like gaits, thanks to a surprisingly simple and efficient framework that eschews adversarial training.
Ditch fixed compute budgets: this new flow-matching method for robotic control adaptively allocates computation, speeding up simple tasks and focusing on complex ones.
Humanoid robots can now handle heavy, unknown payloads in the real world thanks to a system that identifies mass distribution via differentiable simulation.
Now you can predict the structure of biomolecular assemblies exceeding 30,000 residues, thanks to a new context parallelism framework that shatters previous memory constraints.
Domain skew in federated learning can be tamed by decoupling and calibrating domain-specific features, leading to more consistent and generalizable global models.
Training trillion-parameter Mixture-of-Experts models just got a whole lot faster: Megatron Core now achieves >1 PFLOP/GPU on NVIDIA's latest hardware.
Injecting curvature information into MLIP training via Hessian-vector products achieves the accuracy of full-Hessian training with >24x speedups, opening the door to more efficient and accurate potential energy surface learning.
Achieve state-of-the-art results in high-resolution video geometry estimation by disentangling global coherence and fine detail using a dual-stream transformer architecture.
Forget everything you thought you knew about continual learning: pretrained Vision-Language-Action models can learn new robotic skills without catastrophic forgetting, even with minimal replay.
Representing tensor layouts with a hierarchical algebra unlocks powerful compile-time reasoning and simplifies the expression of tiling/partitioning patterns for specialized hardware.
Forget expensive data generation and unstable PINNs: this method trains neural PDE solvers with cheap, noisy Monte Carlo estimates, achieving up to 8.75x improvement in L2 error.
Ditch quadratic scaling in 3D reconstruction: VGG-T$^3$ achieves linear scaling and a 11.6x speed-up by distilling scene geometry into a fixed-size MLP.
Test-time training with KV binding isn't memorization, it's secretly a learned linear attention mechanism, unlocking architectural simplifications and parallelization.
Forget hand-crafted datasets: a new synthetic data pipeline lets smaller LLMs beat giants at terminal tasks.
Unlock the potential of Kolmogorov-Arnold Networks with WS-KAN, a weight-space architecture that understands their hidden symmetries and predicts their performance far better than generic methods.
Forget robotics pre-training: ActionCodec, a new action tokenizer designed with information-theoretic principles, achieves state-of-the-art VLA performance on LIBERO.
Forget monolithic LoRAs: LoRWeB dynamically mixes a basis set of LoRAs to unlock SOTA generalization in visual analogy tasks.
GLM-5 doesn't just code; it engineers, showcasing unprecedented capability in tackling end-to-end software engineering challenges.
Uniform-state diffusion models, often overlooked in favor of masked diffusion, surprisingly outperform autoregressive and masked diffusion models on GSM8K when scaled to 1.7B parameters, despite worse perplexity.
Achieve state-of-the-art depth completion by adapting 3D foundation models at test time with minimal parameter updates, outperforming task-specific encoders that often overfit.
Pathology image analysis just got a whole lot greener: LitePath slashes computational costs by 400x while matching the accuracy of state-of-the-art models, making AI-powered diagnostics accessible on low-power edge devices.
Smaller reasoning models can achieve both higher accuracy and shorter reasoning chains by adaptively penalizing unnecessary reflections and coordinating length penalties with problem complexity.
Forget fixed masking ratios: this new self-supervised learning approach for time-series data dynamically adjusts noise levels to extract richer, more versatile representations.
Open-source biomolecular modeling just got a boost: RF3 closes the gap with AlphaFold3 in structure prediction, thanks to the new AtomWorks data framework.