ShanghaiTechMar 18, 2026arXiv:2603.17571

PanoVGGT: Feed-Forward 3D Reconstruction from Panoramic Imagery

Yijing Guo, Mengjun Chao, Luo Wang, Tianyang Zhao, Haizhao Dai, Yingliang Zhang, Jingyi Yu, Yujiao Shi

AI Summary

PanoVGGT, a novel permutation-equivariant Transformer framework, is introduced to jointly predict camera poses, depth maps, and 3D point clouds from panoramic imagery in a single forward pass. It addresses the challenges of non-pinhole distortions in panoramic images by incorporating spherical-aware positional embeddings, panorama-specific three-axis SO(3) rotation augmentation, and a stochastic anchoring strategy. Experiments on the newly introduced PanoCity dataset and standard benchmarks demonstrate competitive accuracy, robustness, and improved cross-domain generalization compared to existing feed-forward models designed for perspective cameras.

Key Contribution

Panoramic 3D reconstruction gets a boost with PanoVGGT, a Transformer that handles spherical distortions and global-frame ambiguity to deliver state-of-the-art accuracy in a single pass.

Abstract

Panoramic imagery offers a full 360° field of view and is increasingly common in consumer devices. However, it introduces non-pinhole distortions that challenge joint pose estimation and 3D reconstruction. Existing feed-forward models, built for perspective cameras, generalize poorly to this setting. We propose PanoVGGT, a permutation-equivariant Transformer framework that jointly predicts camera poses, depth maps, and 3D point clouds from one or multiple panoramas in a single forward pass. The model incorporates spherical-aware positional embeddings and a panorama-specific three-axis SO(3) rotation augmentation, enabling effective geometric reasoning in the spherical domain. To resolve inherent global-frame ambiguity, we further introduce a stochastic anchoring strategy during training. In addition, we contribute PanoCity, a large-scale outdoor panoramic dataset with dense depth and 6-DoF pose annotations. Extensive experiments on PanoCity and standard benchmarks demonstrate that PanoVGGT achieves competitive accuracy, strong robustness, and improved cross-domain generalization. Code and dataset will be released.

Architecture Design (Transformers, SSMs, MoE)Computer Vision Robotics & Embodied AI

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

PanoVGGT: Feed-Forward 3D Reconstruction from Panoramic Imagery

Related Papers