M. Shoeybi

Audex achieves state-of-the-art audio understanding and generation while maintaining the reasoning prowess of its text-only foundation, all through a unified architecture.

Zhifeng Kong, Sang-gil Lee, JaeHyeon Kim +17

Multimodal Models Speech & Audio

Jun 1, 2026

NVIDIAJun 1, 2026·also BAIR, Galbot, Georgia Tech, HKUST +9

Cosmos 3: Omnimodal World Models for Physical AI

Cosmos 3 sets a new benchmark for omnimodal models, outperforming existing state-of-the-art in both Text-to-Image and Image-to-Video tasks.

Aditi, Niket Agarwal, Arslan Ali +285

Multimodal Models Robotics & Embodied AI World Models & Planning

Apr 27, 2026

NVIDIAApr 27, 2026·also Amazon Science, Microsoft Research, UW, Music X Lab +1

Nemotron 3 Nano Omni: Efficient and Open Multimodal Intelligence

Multimodal models can now achieve state-of-the-art performance in real-world tasks like document understanding and audio-video comprehension with significantly reduced inference latency thanks to novel token-reduction techniques.

Nvidia Amala Sanjay Deshmukh, K. Chumachenko, Tuomas Rintamaki +208

Architecture Design (Transformers, SSMs, MoE)Multimodal Models Speech & Audio

Apr 13, 2026

NVIDIAApr 13, 2026·also IIT Delhi, Indraprastha Institute of Information, Jaypee Institute of Information, UMD

Audio Flamingo Next: Next-Generation Open Audio-Language Models for Speech, Sound, and Music

Audio-language models can now reason about 30-minute-long audio clips with timestamp-grounded intermediate steps, unlocking a new level of fine-grained understanding.

Sreyan Ghosh, Arushi Goel, Kaousheik Jayakumar +17

Multimodal Models Open-Source Models & Weights Speech & Audio

Mar 19, 2026

NVIDIAMar 19, 2026·also HKUST, Samsung, Waterloo

Nemotron-Cascade 2: Post-Training LLMs with Cascade RL and Multi-Domain On-Policy Distillation

A 30B MoE model can now achieve Gold Medal-level performance in IMO, IOI, and ICPC, rivaling frontier models with 20x more parameters.

Zhuoling Yang, Zhuolin Yang, Yang Chen +23

Code Generation & Program Synthesis Reasoning & Chain-of-Thought RLHF & Preference Learning

Feb 24, 2026

NVIDIAFeb 24, 2026

On Data Engineering for Scaling LLM Terminal Capabilities

Forget hand-crafted datasets: a new synthetic data pipeline lets smaller LLMs beat giants at terminal tasks.