NVIDIA Research

Open-vocabulary 3D instance segmentation just got 100x faster, thanks to a new transformer architecture that ditches region proposals and fragmented masks.

C. Choy, Junha Lee, Chunghyun Park +2

Architecture Design (Transformers, SSMs, MoE)Computer Vision Robotics & Embodied AI

Apr 15, 2026

NVIDIAApr 15, 2026·also TU Delft

Beyond Conservative Automated Driving in Multi-Agent Scenarios via Coupled Model Predictive Control and Deep Reinforcement Learning

Fusing MPC with RL yields safer and more efficient autonomous driving at intersections, outperforming both standalone MPC and end-to-end RL, and surprisingly generalizing better to new scenarios.

Saeed Rahmani, Gözde Körpe, Gozde Korpe +9

Robotics & Embodied AI World Models & Planning

Apr 14, 2026

NVIDIAApr 14, 2026·also Sydney, UofT

RoboLab: A High-Fidelity Simulation Benchmark for Analysis of Task Generalist Policies

RoboLab exposes critical performance gaps in leading robotic models, revealing that high-fidelity simulations can better assess generalization than traditional benchmarks.

Xuning Yang, Rishit Dagli, Alex Zook +5

Eval Frameworks & Benchmarks Robotics & Embodied AI World Models & Planning

Apr 9, 2026

NVIDIAApr 9, 2026

Scaling-Aware Data Selection for End-to-End Autonomous Driving Systems

Training autonomous vehicles can be dramatically sped up: MOSAIC achieves state-of-the-art driving performance with 80% less data by intelligently selecting training examples based on scaling laws.

Tolga Dimlioglu, Nadine Chang, Maying Shen +3

Data Curation & Synthetic Data Robotics & Embodied AI Training Efficiency & Optimization

NVIDIAApr 9, 2026·also SJTU

DP-DeGauss: Dynamic Probabilistic Gaussian Decomposition for Egocentric 4D Scene Reconstruction

Finally, a method disentangles dynamic egocentric scenes into background, hand, and object components, enabling fine-grained understanding and editing.

Ting-Hsuan Chen, Tingxi Chen, Zhengxue Cheng +5

Computer Vision Multimodal Models Robotics & Embodied AI

Apr 8, 2026

NVIDIAApr 8, 2026·also HKU, MBZUAI

Fast-dVLM: Efficient Block-Diffusion VLM via Direct Conversion from Autoregressive VLM

Swap out slow, one-token-at-a-time generation in VLMs for a 6x speed boost, without sacrificing quality, using a surprisingly simple direct conversion to block-diffusion decoding.

Chengyue Wu, Shiyi Lan, Yonggan Fu +8

Inference & Quantization Multimodal Models Robotics & Embodied AI

NVIDIAApr 8, 2026·also UIUC

MoRight: Motion Control Done Right

Finally, a video generation model lets you puppeteer objects and their reactions independently, all while freely moving the camera.

Shaowei Liu, Xuanchi Ren, Tianchang Shen +4

Computer Vision Multimodal Models Robotics & Embodied AI+1

Mar 30, 2026

Mar 30, 2026·also NVIDIA

\textit{4DSurf}: High-Fidelity Dynamic Scene Surface Reconstruction

Achieve 49% and 19% better Chamfer distance than state-of-the-art dynamic surface reconstruction methods on Hi4D and CMU Panoptic datasets, respectively, by enforcing temporal consistency in Gaussian Splatting.

Renjie Wu, Jose M. Alvarez

Computer Vision Robotics & Embodied AI

Mar 17, 2026

IdealworksMar 17, 2026·also NVIDIA, NSFC

Industrial cuVSLAM Benchmark&Integration

A hybrid cuVSLAM-based visual SLAM system achieves superior mapping accuracy in real-world logistics environments, outperforming other VO/VSLAM approaches.

Charbel Abi Hana, Kameel Amareen, M. Mostafa +4

Computer Vision Eval Frameworks & Benchmarks Robotics & Embodied AI

Mar 12, 2026

Mar 12, 2026·also NVIDIA, Stanford HAI

Diversity You Can Actually Measure: A Fast, Model-Free Diversity Metric for Robotics Datasets

Forget slow, model-dependent curation: FAKTUAL offers a fast, model-free way to boost robot imitation learning by directly maximizing the entropy of demonstration datasets.

Sreevardhan Sirigiri, N. D. Lara, Christopher Agia +2

Data Curation & Synthetic Data Robotics & Embodied AI

Mar 4, 2026

NVIDIAMar 4, 2026·also UT Austin

RoboCasa365: A Large-Scale Simulation Framework for Training and Benchmarking Generalist Robots

Training generalist robots just got a whole lot easier: RoboCasa365 offers a massive, diverse, and reproducible benchmark for household mobile manipulation.

Soroush Nasiriany, Sepehr Nasiriany, Abhiram Maddukuri +1

Eval Frameworks & Benchmarks Robotics & Embodied AI World Models & Planning

CMU MLMar 4, 2026·also BAIR, MIT CSAIL, NVIDIA, Tsinghua AI +11

ManipulationNet: An Infrastructure for Benchmarking Real-World Robot Manipulation with Physical Skill Challenges and Embodied Multimodal Reasoning

Forget simulated manipulation—ManipulationNet offers a global infrastructure for benchmarking robots in the real world, complete with standardized hardware and software, to finally measure progress toward general manipulation.

Kenneth Kimble, Kenny Kimble, Edward H. Adelson +23

Eval Frameworks & Benchmarks Multimodal Models Robotics & Embodied AI

Mar 2, 2026

Mar 2, 2026·also AI2, MIT CSAIL, NVIDIA, Stanford HAI +5

Robometer: Scaling General-Purpose Robotic Reward Models via Trajectory Comparisons

Learning robotic reward functions from a million trajectories reveals that comparing entire trajectories, not just individual frames, unlocks better generalization and learning from suboptimal data.

Anthony Liang, Jiahui Zhang, Minyoung Hwang +11

RLHF & Preference Learning Robotics & Embodied AI

Feb 10, 2026

NVIDIAFeb 10, 2026

ArtisanGS: Interactive Tools for Gaussian Splat Selection with AI and Human in the Loop

Forget tedious manual segmentation: ArtisanGS lets you lasso objects in 3D Gaussian Splats with AI-powered 2D selections that propagate into 3D, giving you unprecedented control over editing.

Clement Fuji Tsang, Anita Hu, Or Perel +2

Computer Vision Robotics & Embodied AI Tool Use & Agents

Oct 28, 2025

NVIDIAOct 28, 2025·also BUPT, Cohere, Georgia Tech, KAIST +5

World Simulation with Video Foundation Models for Physical AI

Forget synthetic data that looks like it came from a PS2 game: NVIDIA's new Cosmos-Predict2.5 generates high-fidelity videos for training embodied AI, opening the door to more realistic and reliable simulations.

Nvidia Arslan Ali, Junjie Bai, Maciej Bala +8536

Multimodal Models Robotics & Embodied AI World Models & Planning

Search

NVIDIA Research