Mar 4, 2026arXiv:2603.04179

NOVA3R: Non-pixel-aligned Visual Transformer for Amodal 3D Reconstruction

Weirong Chen, Chuanxia Zheng, Ganlin Zhang, Andrea Vedaldi, Daniel Cremers

AI Summary

NOVA3R, a novel non-pixel-aligned visual transformer, reconstructs complete 3D scenes from unposed images by learning a global, view-agnostic scene representation. This approach uses a scene-token mechanism to aggregate information across images and a diffusion-based 3D decoder to produce complete point clouds, addressing limitations of pixel-aligned methods like incomplete geometry and duplicated structures. Experiments show NOVA3R achieves superior reconstruction accuracy and completeness on scene-level and object-level datasets compared to state-of-the-art methods.

Key Contribution

Ditch the per-ray prediction bottleneck: NOVA3R's global scene representation and diffusion-based decoder unlock more complete and accurate 3D reconstructions from unposed images.

Abstract

We present NOVA3R, an effective approach for non-pixel-aligned 3D reconstruction from a set of unposed images in a feed-forward manner. Unlike pixel-aligned methods that tie geometry to per-ray predictions, our formulation learns a global, view-agnostic scene representation that decouples reconstruction from pixel alignment. This addresses two key limitations in pixel-aligned 3D: (1) it recovers both visible and invisible points with a complete scene representation, and (2) it produces physically plausible geometry with fewer duplicated structures in overlapping regions. To achieve this, we introduce a scene-token mechanism that aggregates information across unposed images and a diffusion-based 3D decoder that reconstructs complete, non-pixel-aligned point clouds. Extensive experiments on both scene-level and object-level datasets demonstrate that NOVA3R outperforms state-of-the-art methods in terms of reconstruction accuracy and completeness.

Architecture Design (Transformers, SSMs, MoE)Computer Vision Multimodal Models

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

NOVA3R: Non-pixel-aligned Visual Transformer for Amodal 3D Reconstruction

Related Papers