Search papers, labs, and topics across Lattice.
This paper introduces ActMVS, a novel framework for monocular active scene reconstruction that enables robots and UAVs to autonomously generate high-quality occupancy maps in real time without the need for depth sensors. By integrating a view factor graph construction with global depth optimization, ActMVS achieves globally consistent dense depth maps suitable for safe trajectory planning. Experiments show that ActMVS performs competitively with existing RGB-D methods, highlighting its potential for cost-effective robotic navigation.
Monocular robots can now achieve real-time, high-confidence occupancy mapping, rivaling traditional RGB-D systems without the added weight and cost of depth sensors.
Active scene reconstruction enables robots/UAVs to autonomously plan trajectories and reconstruct environments without costly manual data acquisition. Unlike passive methods, active reconstruction requires real-time construction of high-confidence occupancy maps for collision-free navigation. Existing approaches rely on depth sensors for occupancy map updates, increasing platform cost and weight. To advance spatial intelligence, we aim for a vision-only monocular solution. However, current monocular scene reconstruction methods operate offline and fail to deliver globally consistent dense depth at the frame rates required for robots/UAVs navigation. To bridge this gap, we introduce ActMVS, the first framework for monocular active reconstruction. Our framework integrates a view factor graph construction for informed Multi-View Stereo depth prediction, along with a global depth optimization, to enable the online generation of high-quality, globally consistent dense depth maps. This enables monocular robots/UAVs to maintain reliable occupancy maps for safe trajectory planning during reconstruction. Experiments on Replica datasets demonstrate performance competitive with RGB-D methods. Our code and data are available at https://github.com/TrickyGo/ActMVS.