Search papers, labs, and topics across Lattice.
This paper introduces a bootstrap perception system for indoor robot navigation that addresses hardware depth sensor failure by fusing LiDAR, hardware depth, and learned monocular depth. The system leverages the valid depth pixels from the failing sensor to calibrate a learned monocular depth model, effectively filling in depth gaps without external data. Experiments in corridor and dynamic pedestrian environments demonstrate a 55-110% increase in costmap obstacle coverage compared to LiDAR alone, with a distilled student model achieving comparable navigation performance to ground-truth depth at a fraction of the computational cost.
Even with 78% depth pixel loss, robots can navigate effectively by bootstrapping perception from the sensor's *own* sparse, valid data to calibrate learned monocular depth.
We present a bootstrap perception system for indoor robot navigation under hardware depth failure. In our corridor data, the time-of-flight camera loses up to 78% of its depth pixels on reflective surfaces, yet a 2D LiDAR alone cannot sense obstacles above its scan plane. Our system exploits a self-referential property of this failure: the sensor's surviving valid pixels calibrate learned monocular depth to metric scale, so the system fills its own gaps without external data. The architecture forms a failure-aware sensing hierarchy, conservative when sensors work and filling in when they fail: LiDAR remains the geometric anchor, hardware depth is kept where valid, and learned depth enters only where needed. In corridor and dynamic pedestrian evaluations, selective fusion increases costmap obstacle coverage by 55-110% over LiDAR alone. A compact distilled student runs at 218\,FPS on a Jetson Orin Nano and achieves 9/10 navigation success with zero collisions in closed-loop simulation, matching the ground-truth depth baseline at a fraction of the foundation model's cost.