Search papers, labs, and topics across Lattice.
AutoReg3D is introduced as an autoregressive 3D object detector that formulates detection as sequence generation, emitting objects in a near-to-far order based on point cloud features. This approach eliminates the need for hand-crafted components like anchor assignment and NMS, simplifying training and enhancing extensibility. The model achieves competitive performance on nuScenes dataset and unlocks the potential to apply language-model advances, such as GRPO-style reinforcement learning, to 3D perception tasks.
Ditch the anchors and NMS: AutoReg3D reimagines 3D object detection as a sequence generation problem, opening the door for language-model techniques in 3D perception.
LiDAR-based 3D object detectors typically rely on proposal heads with hand-crafted components like anchor assignment and non-maximum suppression (NMS), complicating training and limiting extensibility. We present AutoReg3D, an autoregressive 3D detector that casts detection as sequence generation. Given point-cloud features, AutoReg3D emits objects in a range-causal (near-to-far) order and encodes each object as a short, discrete-token sequence consisting of its center, size, orientation, velocity, and class. This near-to-far ordering mirrors LiDAR geometry--near objects occlude far ones but not vice versa--enabling straightforward teacher forcing during training and autoregressive decoding at test time. AutoReg3D is compatible across diverse point-cloud or backbones and attains competitive nuScenes performance without anchors or NMS. Beyond parity, the sequential formulation unlocks language-model advances for 3D perception, including GRPO-style reinforcement learning for task-aligned objectives. These results position autoregressive decoding as a viable, flexible alternative for LiDAR-based detection and open a path to importing modern sequence-modeling tools into 3D perception.