Song Han

Jet-Long achieves up to 1.39x throughput improvements while maintaining accuracy across long-context tasks, setting a new standard for zero-shot context extension in LLMs.

Haozhan Tang, Haozhan Tang, Zerui Wang +7

Recommendation & Information Retrieval Tool Use & Agents

Jun 1, 2026

Qixin Hu +1Jun 1, 2026

LongLive-RAG: A General Retrieval-Augmented Framework for Long Video Generation

LongLive-RAG transforms long video generation by enabling the use of a searchable memory of past latents, drastically reducing error accumulation.

Qixin Hu, Song Han

Computer Vision Recommendation & Information Retrieval

NVIDIAJun 1, 2026·also BAIR, Galbot, Georgia Tech, HKUST +9

Cosmos 3: Omnimodal World Models for Physical AI

Cosmos 3 sets a new benchmark for omnimodal models, outperforming existing state-of-the-art in both Text-to-Image and Image-to-Video tasks.

Aditi, Niket Agarwal, Arslan Ali +285

Multimodal Models Robotics & Embodied AI World Models & Planning

May 28, 2026

Yuyang Zhao +6May 28, 2026

SANA-Streaming: Real-time Streaming Video Editing with Hybrid Diffusion Transformer

Real-time, high-resolution video editing is now possible on a single consumer GPU, thanks to a novel hybrid diffusion transformer and system-level optimizations that achieve 24 FPS at 1280x704.

Yuyang Zhao, Yicheng Pan, Qiyuan He +4

Architecture Design (Transformers, SSMs, MoE)Computer Vision Inference & Quantization

NVIDIAMay 28, 2026·also Beihang, HKU, UCSD, University of California

Grounded 3D-Aware Spatial Vision-Language Modeling

Grounding boosts spatial reasoning in VLMs: explicitly linking language to 2D and 3D scene elements lets models decompose complex spatial problems and improve performance even on non-grounded tasks.

An-Chieh Cheng, Yang Fu, Yang Fu +21

Computer Vision Multimodal Models Reasoning & Chain-of-Thought

May 22, 2026

Kewei Zhang +8May 22, 2026

Fast-dDrive: Efficient Block-Diffusion VLM for Autonomous Driving

By structuring diffusion-based driving models around a "scaffold" of frozen structural tokens, Fast-dDrive achieves a 12x speedup over autoregressive baselines while improving trajectory accuracy.

Kewei Zhang, Sensen Gao, Yulong Cao +6

Inference & Quantization Multimodal Models Robotics & Embodied AI

Search

Song Han

Publication activitypapers/week, last 8 weeks

Research focus

Frequent co-authors

Papers (7)