Search papers, labs, and topics across Lattice.
The paper introduces FreeOmniMVS, a novel reference-free framework for omnidirectional stereo matching that maximizes multi-view consistency. It uses a View-pair Correlation Transformer (VCT) to model pairwise correlation volumes across all camera view pairs, enabling the system to disregard unreliable pairs due to occlusion or defocus. A lightweight attention mechanism then adaptively fuses the correlation vectors, eliminating the need for a reference view and promoting equal contribution from all cameras.
Achieve robust omnidirectional depth estimation by ditching reference views and directly maximizing multi-view consistency with a novel View-pair Correlation Transformer.
Reliable omnidirectional depth estimation from multi-fisheye stereo matching is pivotal to many applications, such as embodied robotics. Existing approaches either rely on spherical sweeping with heuristic fusion strategies to build the cost columns or perform reference-centric stereo matching based on rectified views. However, these methods fail to explicitly exploit geometric relationships between multiple views, rendering them less capable of capturing the global dependencies, visibility, or scale changes. In this paper, we shift to a new perspective and propose a novel reference-free framework, dubbed FreeOmniMVS, via multi-view consistency maximization. The highlight of FreeOmniMVS is that it can aggregate pair-wise correlations into a robust, visibility-aware, and global consensus. As such, it is tolerant to occlusions, partial overlaps, and varying baselines. Specifically, to achieve global coherence, we introduce a novel View-pair Correlation Transformer (VCT) that explicitly models pairwise correlation volumes across all camera view pairs, allowing us to drop unreliable pairs caused by occlusion or out-of-focus observations. To realize scalable and visibility-aware consensus, we propose a lightweight attention mechanism that adaptively fuses the correlation vectors, eliminating the need for a designated reference view and allowing all cameras to contribute equally to the stereo matching process. Extensive experiments on diverse benchmark datasets demonstrate the superiority of our method for globally consistent, visibility-aware, and scale-aware omnidirectional depth estimation.