Stanford HAICornellMar 9, 2026arXiv:2603.07985

On the Feasibility and Opportunity of Autoregressive 3D Object Detection

Zanming Huang, Jinsu Yoo, Sooyoung Jeon, Zhenzhen Liu, Mark Campbell, Kilian Q Weinberger, Bharath Hariharan, Wei-Lun Chao, Katie Z Luo

AI Summary

AutoReg3D is introduced as an autoregressive 3D object detector that formulates detection as sequence generation, emitting objects in a near-to-far order based on point cloud features. This approach eliminates the need for hand-crafted components like anchor assignment and NMS, simplifying training and enhancing extensibility. The model achieves competitive performance on nuScenes dataset and unlocks the potential to apply language-model advances, such as GRPO-style reinforcement learning, to 3D perception tasks.

Key Contribution

Ditch the anchors and NMS: AutoReg3D reimagines 3D object detection as a sequence generation problem, opening the door for language-model techniques in 3D perception.

Abstract

LiDAR-based 3D object detectors typically rely on proposal heads with hand-crafted components like anchor assignment and non-maximum suppression (NMS), complicating training and limiting extensibility. We present AutoReg3D, an autoregressive 3D detector that casts detection as sequence generation. Given point-cloud features, AutoReg3D emits objects in a range-causal (near-to-far) order and encodes each object as a short, discrete-token sequence consisting of its center, size, orientation, velocity, and class. This near-to-far ordering mirrors LiDAR geometry--near objects occlude far ones but not vice versa--enabling straightforward teacher forcing during training and autoregressive decoding at test time. AutoReg3D is compatible across diverse point-cloud or backbones and attains competitive nuScenes performance without anchors or NMS. Beyond parity, the sequential formulation unlocks language-model advances for 3D perception, including GRPO-style reinforcement learning for task-aligned objectives. These results position autoregressive decoding as a viable, flexible alternative for LiDAR-based detection and open a path to importing modern sequence-modeling tools into 3D perception.

Architecture Design (Transformers, SSMs, MoE)Computer Vision Robotics & Embodied AI

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

On the Feasibility and Opportunity of Autoregressive 3D Object Detection

Related Papers