Search papers, labs, and topics across Lattice.
The paper introduces GeoFusionLRM, a self-correction framework for single-image 3D reconstruction that improves geometric consistency and detail alignment in large reconstruction models (LRMs). GeoFusionLRM uses a dedicated transformer and fusion module to feed back the model's own normal and depth predictions, enabling error correction and consistency enforcement with the input image. Experiments show GeoFusionLRM achieves sharper geometry, more consistent normals, and higher fidelity compared to state-of-the-art LRM baselines without additional supervision.
By feeding a large reconstruction model its own geometry predictions, GeoFusionLRM achieves sharper and more consistent 3D reconstructions from single images.
Single-image 3D reconstruction with large reconstruction models (LRMs) has advanced rapidly, yet reconstructions often exhibit geometric inconsistencies and misaligned details that limit fidelity. We introduce GeoFusionLRM, a geometry-aware self-correction framework that leverages the model's own normal and depth predictions to refine structural accuracy. Unlike prior approaches that rely solely on features extracted from the input image, GeoFusionLRM feeds back geometric cues through a dedicated transformer and fusion module, enabling the model to correct errors and enforce consistency with the conditioning image. This design improves the alignment between the reconstructed mesh and the input views without additional supervision or external signals. Extensive experiments demonstrate that GeoFusionLRM achieves sharper geometry, more consistent normals, and higher fidelity than state-of-the-art LRM baselines.