NVIDIA Research

Research division of NVIDIA focusing on GPU-accelerated AI, computer graphics, robotics, and autonomous systems.

www.nvidia.com

Total papers

Total citations

143

Avg citations

Top Researchers

Bryan CatanzaroAnima AnandkumarJan Kautz

Recent Papers

Feb 12, 2026

2d ago

Extending Puzzle for Mixture-of-Experts Reasoning Models with Application to GPT-OSS Acceleration

The authors extend the Puzzle post-training neural architecture search framework to optimize the gpt-oss-120B model, creating gpt-oss-puzzle-88B, by combining heterogeneous MoE expert pruning, selective attention replacement, FP8 quantization, and post-training reinforcement learning. This optimized model achieves significant per-token throughput speedups (up to 2.82X on a single H100 GPU) while maintaining or slightly exceeding the parent model's accuracy across various benchmarks. The paper advocates for request-level efficiency metrics to account for varying token counts and demonstrates that gpt-oss-puzzle-88B improves request-level efficiency by up to 1.29X.

Introduces a pipeline combining heterogeneous MoE expert pruning, selective attention replacement, FP8 quantization, and post-training reinforcement learning within the Puzzle framework to optimize large language models for inference.

A. Bercovich, Nir Ailon, Vladimir Anisimov +212602.11937

Architecture Design (Transformers, SSMs, MoE)Inference & QuantizationOpen-Source Models & Weights

2d ago·affiliated lab: NVIDIA Research

Function-Space Decoupled Diffusion for Forward and Inverse Modeling in Carbon Capture and Storage

The paper introduces Fun-DDPS, a generative framework for carbon capture and storage (CCS) modeling that combines function-space diffusion models with differentiable neural operator surrogates for both forward and inverse problems. By decoupling the learning of a prior over geological parameters from the physics-consistent guidance provided by a Local Neural Operator (LNO) surrogate, Fun-DDPS effectively handles data sparsity and ensures physically realistic solutions. Experiments on synthetic CCS datasets demonstrate that Fun-DDPS significantly outperforms standard surrogates in forward modeling with sparse observations and achieves comparable accuracy to rejection sampling in inverse modeling, while also generating physically consistent realizations with improved sample efficiency.

Introduces a function-space decoupled diffusion framework (Fun-DDPS) that improves both the accuracy and physical realism of forward and inverse modeling in carbon capture and storage.

Xin Ju, Jiachen Yao, Anima Anandkumar +22602.12274

Scientific Discovery & Drug DesignArchitecture Design (Transformers, SSMs, MoE)

2d ago

Self-Supervised Learning via Flow-Guided Neural Operator on Time-Series Data

This paper introduces Flow-Guided Neural Operator (FGNO), a self-supervised learning framework for time-series data that leverages flow matching to dynamically adjust the corruption level during training. FGNO uses Short-Time Fourier Transform to handle varying time resolutions and extracts hierarchical features by applying different levels of noise through network layers and flow times. By training with noisy inputs but extracting representations from clean inputs, FGNO achieves state-of-the-art performance across multiple biomedical time-series tasks, demonstrating robustness to data scarcity and improved representation learning.

Introduces Flow-Guided Neural Operator (FGNO), a novel self-supervised learning framework that dynamically adjusts corruption levels during training using flow matching and extracts representations from clean inputs.

Duy Nguyen, Jiachen Yao, Julius Berner +12602.12267

Training Efficiency & OptimizationArchitecture Design (Transformers, SSMs, MoE)

Feb 10, 2026

4d ago

ArtisanGS: Interactive Tools for Gaussian Splat Selection with AI and Human in the Loop

The paper introduces ArtisanGS, an interactive tool suite for selecting and segmenting 3D Gaussian Splats (3DGS) to enable controllable editing of in-the-wild captures. It presents a fast AI-driven method for propagating user-guided 2D selection masks to 3DGS selections, supplemented by manual selection and segmentation tools for user intervention. The toolset's utility is demonstrated through user-guided local editing using a custom Video Diffusion Model, achieving binary segmentation of unstructured 3DGS scenes without additional optimization.

Introduces an interactive tool suite, ArtisanGS, for versatile Gaussian Splat selection and segmentation, enabling user-guided editing via a novel AI-driven propagation method and manual tools.

Clement Fuji Tsang, Anita Hu, Or Perel +22602.10173

Tool Use & AgentsComputer VisionRobotics & Embodied AI

Jan 31, 2026

2w ago

DuoGen: Towards General Purpose Interleaved Multimodal Generation

The paper introduces DuoGen, a general-purpose interleaved multimodal generation framework designed to improve the quality of models generating interleaved image and text sequences under general instructions. DuoGen constructs a large-scale instruction-tuning dataset from curated websites and synthetic examples and employs a two-stage decoupled training strategy using a pretrained multimodal LLM and a diffusion transformer (DiT). Experiments demonstrate that DuoGen outperforms existing open-source models in text quality, image fidelity, and image-context alignment, achieving state-of-the-art performance in text-to-image generation and image editing.

Introduces a two-stage decoupled training strategy for interleaved multimodal generation that combines a pretrained multimodal LLM for instruction understanding with a diffusion transformer (DiT) for image generation.

Min Shi, Xiaohui Zeng, Jiannan Huang +132602.00508

Multimodal ModelsData Curation & Synthetic DataArchitecture Design (Transformers, SSMs, MoE)

Nov 27, 2025

Evaluating the Robustness of Large Language Model Safety Guardrails Against Adversarial Attacks

This paper evaluates the robustness of ten publicly available LLM safety guardrail models from major tech companies against 1,445 adversarial prompts across 21 attack categories. The study reveals a significant performance drop in all models when tested on novel, unseen prompts compared to public benchmarks, indicating potential training data contamination. A novel "helpful mode" jailbreak was also discovered in two models, where they generated harmful content instead of blocking it.

Demonstrates that current LLM safety guardrail models exhibit poor generalization to novel adversarial attacks, highlighting the limitations of relying solely on benchmark performance for evaluation.

Richard J. Young2511.22047

Red-Teaming & Adversarial RobustnessEval Frameworks & BenchmarksConstitutional AI & AI Ethics

Oct 28, 2025

World Simulation with Video Foundation Models for Physical AI

The authors introduce Cosmos-Predict2.5, a flow-based video foundation model for physical AI that unifies Text2World, Image2World, and Video2World generation, leveraging a vision-language model for improved text grounding. Trained on 200M video clips and refined with reinforcement learning, Cosmos-Predict2.5 demonstrates significant improvements in video quality and instruction alignment compared to its predecessor, with models released at 2B and 14B scales. They also present Cosmos-Transfer2.5, a control-net style framework for Sim2Real and Real2Real world translation, achieving higher fidelity and robust long-horizon video generation despite being smaller than Cosmos-Transfer1.

Introduces a unified video foundation model, Cosmos-Predict2.5, and a Sim2Real/Real2Real translation framework, Cosmos-Transfer2.5, for scaling embodied intelligence through improved video generation and instruction alignment.

Nvidia Arslan Ali, Junjie Bai, Maciej Bala +85362511.00062

World Models & PlanningMultimodal ModelsRobotics & Embodied AI

Oct 24, 2025

gRED Computational SciencesOct 24, 2025·affiliated labs: NVIDIA Research, Mila

Deep-learning-based virtual screening of antibacterial compounds.

This work integrates small-molecule high-throughput screening with a deep-learning-based virtual screening approach to uncover new antibacterial compounds, illustrating a 90-fold improved hit rate over the high-throughput screening experiment used for training.

Gabriele Scalia, S. Rutherford, Ziqing Lu +17

Oct 16, 2025

Generative Models From and for Sampling-Based MPC: A Bootstrapped Approach for Adaptive Contact-Rich Manipulation

This paper introduces a generative predictive control (GPC) framework that leverages conditional flow-matching models to amortize sampling-based model predictive control (SPC) for contact-rich manipulation. By training these flow-matching models on SPC control sequences generated in simulation, the method learns proposal distributions that enable more efficient and informed sampling during online planning compared to methods relying on iterative refinement or gradient-based solvers. The approach is validated through extensive experiments in simulation and on a quadruped robot performing real-world loco-manipulation, demonstrating improved sample efficiency, reduced planning horizon requirements, and robust generalization.

Demonstrates that conditional flow-matching models can be effectively trained on noisy SPC data to generate meaningful proposal distributions, enabling efficient and robust online planning for contact-rich manipulation.

Lara Brudermüller, Brandon Hung, Xinghao Zhu +42510.14643

World Models & PlanningRobotics & Embodied AITool Use & Agents

Aug 15, 2025

Aug 15, 2025·affiliated lab: NVIDIA Research

Accelerating Biomolecular Modeling with AtomWorks and RF3

The authors introduce AtomWorks, a data framework designed to streamline the development of biomolecular foundation models for tasks like structure prediction and protein design. Using AtomWorks, they trained RosettaFold-3 (RF3), a structure prediction network that improves chirality handling, leading to performance closer to AlphaFold3. The release of AtomWorks, training data, and RF3 model weights under a BSD license aims to accelerate open-source biomolecular machine learning research.

Introduces AtomWorks, a comprehensive data framework, and leverages it to train RF3, a structure prediction network with enhanced chirality treatment, bridging the performance gap with closed-source models.

Nathaniel Corley, Simon V. Mathis, Rohith Krishna +2715

Scientific Discovery & Drug DesignData Curation & Synthetic DataTraining Efficiency & Optimization

Mar 6, 2025

Audio Flamingo 2: An Audio-Language Model with Long-Audio Understanding and Expert Reasoning Abilities

Audio Flamingo 2 (AF2) is introduced as an Audio-Language Model (ALM) that enhances audio understanding and reasoning by utilizing a custom CLAP model, synthetic Audio QA data, and a multi-stage curriculum learning strategy. AF2 achieves state-of-the-art performance on over 20 benchmarks with a 3B parameter model, outperforming larger models. The work also introduces LongAudio, a new dataset for training ALMs on long audio segments (30 secs to 5 mins), and demonstrates exceptional performance on the LongAudioBench benchmark after fine-tuning AF2.

Introduces Audio Flamingo 2, an ALM with enhanced audio understanding and reasoning capabilities, and the LongAudio dataset and benchmark for long audio understanding.

Sreyan Ghosh, Zhifeng Kong, Sonal Kumar +7882503.03983

Multimodal ModelsSpeech & AudioReasoning & Chain-of-Thought

Lattice is designed for desktop

NVIDIA Research

Top Researchers

Recent Papers

Search