Jan Kautz

Papers on Lattice

Total citations

Topics

h-index

Publication activitypapers/week, last 8 weeks

Research focus

Multimodal Models (5)Architecture Design (Transformers, SSMs, MoE) (3)Computer Vision (3)Reasoning & Chain-of-Thought (3)

Frequent co-authors

Yonggan Fu (3)Pavlo Molchanov (3)Hanrong Ye (3)Hongxu Yin (3)

Papers (9)

Jul 7, 2026

NVIDIA2w ago

Nemotron-Labs-Diffusion: A Tri-Mode Language Model Unifying Autoregressive, Diffusion, and Self-Speculation Decoding

Switching between autoregressive and diffusion modes allows Nemotron-Labs-Diffusion to achieve unprecedented throughput and efficiency in language modeling.

Yonggan Fu, Lexington Whalen, L. Whalen +26

Architecture Design (Transformers, SSMs, MoE)Inference & Quantization Natural Language Processing

Jun 29, 2026

NVIDIA3w ago

Nemotron-Labs-Diffusion-Image: Advancing Masked Discrete Diffusion for High-Resolution Image Synthesis

Dynamic token editing in image synthesis could redefine how we approach high-resolution generative models.

Shufan Li, Greg Heinrich, Hanrong Ye +3

Computer Vision Multimodal Models

Jun 18, 2026

NVIDIAJun 18, 2026·also BAIR, Equal advising, JHU, NTU +5

Vesta: A Generalist Embodied Reasoning Model

A single generalist model outperforms specialized systems, achieving over 35% improvement in real-world robotic task success.

Johan Bjorck, Zhiqi Li, Yunze Man +25

Reasoning & Chain-of-Thought Robotics & Embodied AI World Models & Planning

Jun 16, 2026

NVIDIAJun 16, 2026·also HKU, University of California

Reinforcing Dual-Path Reasoning in Spatial Vision Language Models

SR-REAL's dual-path reasoning framework allows spatial VLMs to excel in both linguistic deduction and 3D geometric inference, significantly enhancing performance on complex spatial reasoning tasks.

Yatai Ji, An-Chieh Cheng, Yang Fu +9

Multimodal Models Reasoning & Chain-of-Thought

Jun 15, 2026

AI2Jun 15, 2026·also NVIDIA

ProCUA-SFT Technical Report

Fine-tuning on the new ProCUA-SFT dataset boosts UI-TARS 7B's performance from a dismal 8-10% to an impressive 45.0% on OSWorld tasks, highlighting the critical role of high-quality training data.

Jaehun Jung, Ximing Lu, Brandon Cui +6

Data Curation & Synthetic Data Tool Use & Agents

Jun 12, 2026

AI2Jun 12, 2026·also NVIDIA, Gusu Laboratory of Materials, HKUST, NJU +5

Nemotron 3 Ultra: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning

Achieving six times the inference throughput of current LLMs while maintaining accuracy, Nemotron 3 Ultra redefines performance benchmarks for agentic reasoning tasks.

NVIDIA, Aaron Blakeman, Aaron Thomas +554

Architecture Design (Transformers, SSMs, MoE)Scaling Laws & Emergent Abilities Tool Use & Agents

Jun 3, 2026

NVIDIAJun 3, 2026·also UCLA, UCSD, UT Austin

GRAIL: Generating Humanoid Loco-Manipulation from 3D Assets and Video Priors

GRAIL achieves an impressive 84% success rate in real-world object pick-up tasks using only synthetic data, revolutionizing humanoid robot training.

Tianyi Xie, Haotian Zhang, Jinhyung Park +17

Multimodal Models Robotics & Embodied AI

May 28, 2026

NVIDIAMay 28, 2026·also Beihang, HKU, UCSD, University of California

Grounded 3D-Aware Spatial Vision-Language Modeling

Grounding boosts spatial reasoning in VLMs: explicitly linking language to 2D and 3D scene elements lets models decompose complex spatial problems and improve performance even on non-grounded tasks.

An-Chieh Cheng, Yang Fu, Yang Fu +21

Computer Vision Multimodal Models Reasoning & Chain-of-Thought

Mar 12, 2026

NVIDIAMar 12, 2026·also BAIR, MIT CSAIL, Clarifai, K-frame

Attend Before Attention: Efficient and Scalable Video Understanding via Autoregressive Gazing

MLLMs can now handle 4K videos up to 100x faster thanks to AutoGaze, which selectively attends to only the most informative patches.

Baifeng Shi, Stephanie Fu, Long Lian +12

Architecture Design (Transformers, SSMs, MoE)Computer Vision Multimodal Models

Search

Jan Kautz

Publication activitypapers/week, last 8 weeks

Research focus

Frequent co-authors

Papers (9)