Kurt Keutzer

UC Berkeley 2 UT Austin 3 Stanford University 4 Princeton University 5 Together AI

Papers on Lattice

Total citations

Topics

h-index

Research focus

Inference & Quantization (4)Architecture Design (Transformers, SSMs, MoE) (3)Training Efficiency & Optimization (3)Computer Vision (2)

Frequent co-authors

Haocheng Xi (3)Monishwaran Maheswaran (2)Junxiong Wang (2)Coleman Hooper (2)

Papers (7)

Apr 9, 2026

BAIRApr 9, 2026·also Sydney, Together

Squeeze Evolve: Unified Multi-Model Orchestration for Verifier-Free Evolution

Verifier-free evolution can now match or exceed the performance of verifier-based methods, while slashing API costs by 3x and boosting throughput by 10x, thanks to a clever model orchestration strategy.

Monishwaran Maheswaran, Leon Lakhani, Zhongzhu Zhou +17

Architecture Design (Transformers, SSMs, MoE)Inference & Quantization Training Efficiency & Optimization

Mar 10, 2026

BAIRMar 10, 2026·also NVIDIA, Tsinghua AI, Soyeon Caren Han is the corresponding

Flash-KMeans: Fast and Memory-Efficient Exact K-Means

K-means, previously relegated to offline processing, gets a 17.9x speed boost on modern GPUs thanks to Flash-KMeans' clever IO and contention optimizations.

Shuo Yang, Shuo Yang, Haocheng Xi +16

Distributed Systems & Hardware Inference & Quantization Training Efficiency & Optimization

Mar 9, 2026

BAIRMar 9, 2026·also Tsinghua AI, Soyeon Caren Han is the corresponding

SVG-EAR: Parameter-Free Linear Compensation for Sparse Video Generation via Error-aware Routing

Get 2x faster video generation from diffusion transformers without sacrificing quality, thanks to a clever parameter-free error compensation technique.

Xuanyi Zhou, Qiuyang Mang, Shuo Yang +10

Architecture Design (Transformers, SSMs, MoE)Computer Vision Training Efficiency & Optimization

Mar 4, 2026

BAIRMar 4, 2026·also NVIDIA, K-frame, Together

$V_1$: Unifying Generation and Self-Verification for Parallel Reasoners

Models are substantially better at pairwise self-verification than independent scoring, unlocking a more efficient and accurate approach to test-time scaling for complex reasoning.

Harman Singh, Xiuyu Li, Kusha Sareen +14

Eval Frameworks & Benchmarks Reasoning & Chain-of-Thought

Feb 12, 2026

BAIRFeb 12, 2026

Agentic Test-Time Scaling for WebAgents

Uncertainty-driven dynamic compute allocation lets web agents outperform naive test-time scaling by 9.1% while using 2.3x fewer tokens.

Lutfi Eren Erdogan, Chris Joseph John, Surya Krishnapillai +3

Eval Frameworks & Benchmarks Inference & Quantization Tool Use & Agents

May 28, 2025

May 28, 2025·also BAIR, Project leader

GeoDrive: 3D Geometry-Informed Driving World Model with Precise Action Control

By explicitly encoding 3D geometry, GeoDrive achieves more realistic and controllable autonomous driving scene modeling, outperforming prior world models in action accuracy and spatial awareness.

Anthony Chen, Wenzhao Zheng, Yida Wang +59

Computer Vision Robotics & Embodied AI World Models & Planning

Feb 5, 2025

Rishabh Tiwari +9Feb 5, 2025·also BAIR

QuantSpec: Self-Speculative Decoding with Hierarchical Quantized KV Cache

Forget sparse KV caches – QuantSpec's hierarchical 4-bit quantization unlocks 2.5x speedups in long-context LLM inference with >90% acceptance rates.

Rishabh Tiwari, Haocheng Xi, Aditya Tomar +710

Architecture Design (Transformers, SSMs, MoE)Inference & Quantization

Search

Kurt Keutzer

Research focus

Frequent co-authors

Papers (7)