Search papers, labs, and topics across Lattice.
The paper introduces SF3D-RGB, a deep learning architecture for estimating sparse scene flow by fusing monocular RGB images and sparse LiDAR point clouds. The model encodes features from both modalities, fuses them to enhance a graph matching module for initial scene flow estimation, and then refines the flow with a residual module. The approach balances accuracy and efficiency, achieving superior performance compared to single-modality methods and other fusion-based state-of-the-art techniques while using fewer parameters.
Fusing monocular RGB images with sparse LiDAR data lets you estimate scene flow more accurately and efficiently than using either modality alone.
Scene flow estimation is an extremely important task in computer vision to support the perception of dynamic changes in the scene. For robust scene flow, learning-based approaches have recently achieved impressive results using either image-based or LiDAR-based modalities. However, these methods have tended to focus on the use of a single modality. To tackle these problems, we present a deep learning architecture, SF3D-RGB, that enables sparse scene flow estimation using 2D monocular images and 3D point clouds (e.g., acquired by LiDAR) as inputs. Our architecture is an end-to-end model that first encodes information from each modality into features and fuses them together. Then, the fused features enhance a graph matching module for better and more robust mapping matrix computation to generate an initial scene flow. Finally, a residual scene flow module further refines the initial scene flow. Our model is designed to strike a balance between accuracy and efficiency. Furthermore, experiments show that our proposed method outperforms single-modality methods and achieves better scene flow accuracy on real-world datasets while using fewer parameters compared to other state-of-the-art methods with fusion.