CornellMar 17, 2026arXiv:2603.16742

When the City Teaches the Car: Label-Free 3D Perception from Infrastructure

Zhen Xu, Jinsu Yoo, Cristian Bautista, Zanming Huang, Tai-Yu Pan, Zhenzhen Liu, Katie Z Luo, Mark Campbell, Bharath Hariharan, Wei-Lun Chao

AI Summary

This paper introduces infrastructure-taught, label-free 3D perception, a novel paradigm where roadside units (RSUs) learn 3D detectors from unlabeled data and broadcast predictions to passing vehicles for pseudo-label supervision. A three-stage pipeline is implemented using CenterPoint in a CARLA environment to validate the approach. The results show that this method achieves 82.3% AP for vehicle detection, demonstrating the potential of city infrastructure as a scalable supervisory signal for autonomous vehicles.

Key Contribution

Imagine cities teaching cars to see: this work demonstrates a label-free 3D perception pipeline where roadside sensors train autonomous vehicles, achieving impressive detection accuracy without manual annotation.

Abstract

Building robust 3D perception for self-driving still relies heavily on large-scale data collection and manual annotation, yet this paradigm becomes impractical as deployment expands across diverse cities and regions. Meanwhile, modern cities are increasingly instrumented with roadside units (RSUs), static sensors deployed along roads and at intersections to monitor traffic. This raises a natural question: can the city itself help train the vehicle? We propose infrastructure-taught, label-free 3D perception, a paradigm in which RSUs act as stationary, unsupervised teachers for ego vehicles. Leveraging their fixed viewpoints and repeated observations, RSUs learn local 3D detectors from unlabeled data and broadcast predictions to passing vehicles, which are aggregated as pseudo-label supervision for training a standalone ego detector. The resulting model requires no infrastructure or communication at test time. We instantiate this idea as a fully label-free three-stage pipeline and conduct a concept-and-feasibility study in a CARLA-based multi-agent environment. With CenterPoint, our pipeline achieves 82.3% AP for detecting vehicles, compared to a fully supervised ego upper bound of 94.4%. We further systematically analyze each stage, evaluate its scalability, and demonstrate complementarity with existing ego-centric label-free methods. Together, these results suggest that city infrastructure itself can potentially provide a scalable supervisory signal for autonomous vehicles, positioning infrastructure-taught learning as a promising orthogonal paradigm for reducing annotation cost in 3D perception.

Computer Vision Data Curation & Synthetic Data Robotics & Embodied AI

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

When the City Teaches the Car: Label-Free 3D Perception from Infrastructure

Related Papers