Jan Kautz

Grounding boosts spatial reasoning in VLMs: explicitly linking language to 2D and 3D scene elements lets models decompose complex spatial problems and improve performance even on non-grounded tasks.

An-Chieh Cheng, Yang Fu, Yang Fu +22

Computer Vision Multimodal Models Reasoning & Chain-of-Thought

May 26, 2026

NVIDIA3w ago

LocateAnything: Fast and High-Quality Vision-Language Grounding with Parallel Box Decoding

Ditch slow, token-by-token box generation: LocateAnything's Parallel Box Decoding (PBD) boosts VLM grounding speed and accuracy by decoding entire bounding boxes at once.

Shihao Wang, Shilong Liu, Yuanguo Kuang +11

Architecture Design (Transformers, SSMs, MoE)Computer Vision Multimodal Models

May 21, 2026

AI2May 21, 2026·also NVIDIA

Gated DeltaNet-2: Decoupling Erase and Write in Linear Attention

Forget everything you thought you knew about linear attention: decoupling erase and write operations unlocks significantly better long-context retrieval.

Ali Hatamizadeh, Jan Kautz

Architecture Design (Transformers, SSMs, MoE)Inference & Quantization Training Efficiency & Optimization

Apr 27, 2026

NVIDIAApr 27, 2026·also Amazon Science, Microsoft Research, UW, Music X Lab +1

Nemotron 3 Nano Omni: Efficient and Open Multimodal Intelligence

Multimodal models can now achieve state-of-the-art performance in real-world tasks like document understanding and audio-video comprehension with significantly reduced inference latency thanks to novel token-reduction techniques.

Nvidia Amala Sanjay Deshmukh, K. Chumachenko, Tuomas Rintamaki +209

Architecture Design (Transformers, SSMs, MoE)Multimodal Models Speech & Audio

Feb 17, 2026

NVIDIAFeb 17, 2026·also BAIR, Ant Digital Technologies, Huawei, KAIST +2

World Action Models are Zero-shot Policies

Forget painstakingly engineering robot behaviors: DreamZero learns directly from video of other robots or even humans, adapting to new tasks and bodies with just minutes of data.

Seonghyeon Ye, Seonghyeon Ye, Yunhao Ge +49

Multimodal Models Robotics & Embodied AI World Models & Planning

Search

Jan Kautz

Publication activitypapers/week, last 8 weeks

Research focus

Frequent co-authors

Papers (7)