Search papers, labs, and topics across Lattice.
This paper introduces a novel view-consistent 3D scene editing framework that addresses cross-view inconsistency by modeling joint distributions across multiple viewpoints. By implementing a dual-path consistency mechanism that leverages projection-guided structural guidance and patch-level semantic propagation, the authors enhance the robustness and generalization of 3D scene edits. Experimental results show that their approach significantly outperforms existing methods, achieving precise and consistent views in complex scenes.
Achieving superior 3D scene edits hinges on a dual-path consistency mechanism that effectively integrates structural and semantic cues across multiple views.
Text-driven 3D scene editing has recently attracted increasing attention. Most existing methods follow a render-edit-optimize pipeline, where multi-view images are rendered from a 3D scene, edited with 2D image editors, and then used to optimize the underlying 3D representation. However, cross-view inconsistency remains a major bottleneck. Although recent methods introduce geometric cues, cross-view interactions, or video priors to mitigate this issue, they still largely rely on inference-time synchronization and thus remain limited in robustness and generalization.In this work, we recast multi-view consistent 3D editing from a distributional perspective: 3D scene editing essentially requires a joint distribution modeling across viewpoints.Based on this insight, we propose a view-consistent 3D editing framework that explicitly introduces cross-view dependencies into the editing process. Furthermore, motivated by the observation that structural correspondence and semantic continuity rely on different cross-view cues, we introduce a dual-path consistency mechanism consisting of projection-guided structural guidance and patch-level semantic propagation for effective cross-view editing. Further, we construct a paired multi-view editing dataset that provides reliable supervision for learning cross-view consistency in edited scenes. Extensive experiments demonstrate that our method achieves superior editing performance with precise and consistent views for complex scenes.