Search papers, labs, and topics across Lattice.
SyncFix is introduced, a diffusion-based refinement framework for 3D reconstructions that enforces cross-view consistency by learning a joint conditional over multiple views during denoising. It addresses semantic and geometric inconsistencies by formulating refinement as a joint latent bridge matching problem, synchronizing distorted and clean representations. The framework, trained only on image pairs, generalizes to arbitrary view counts and demonstrates improved reconstruction quality with more views, surpassing state-of-the-art baselines, especially with sparse references.
Fix your janky 3D reconstructions with SyncFix, a diffusion-based method that leverages multi-view consistency to produce high-fidelity results, even without clean reference images.
We present SyncFix, a framework that enforces cross-view consistency during the diffusion-based refinement of reconstructed scenes. SyncFix formulates refinement as a joint latent bridge matching problem, synchronizing distorted and clean representations across multiple views to fix the semantic and geometric inconsistencies. This means SyncFix learns a joint conditional over multiple views to enforce consistency throughout the denoising trajectory. Our training is done only on image pairs, but it generalizes naturally to an arbitrary number of views during inference. Moreover, reconstruction quality improves with additional views, with diminishing returns at higher view counts. Qualitative and quantitative results demonstrate that SyncFix consistently generates high-quality reconstructions and surpasses current state-of-the-art baselines, even in the absence of clean reference images. SyncFix achieves even higher fidelity when sparse references are available.