COWARobot Co. LtdHohaiSJTUMar 18, 2026arXiv:2603.17382

VisionNVS: Self-Supervised Inpainting for Novel View Synthesis under the Virtual-Shift Paradigm

Hongbo Lu, Chenghao He, Fan Liu, Wenlong Liao, Tao He, Pai Peng

AI Summary

VisionNVS reframes novel view synthesis (NVS) as a self-supervised inpainting problem by introducing a "Virtual-Shift" strategy that uses monocular depth proxies to simulate occlusion patterns in the original view. This allows the use of raw recorded images as pixel-perfect supervision, eliminating the domain gap of previous NVS approaches. The method also incorporates a Pseudo-3D Seam Synthesis strategy to integrate visual data from adjacent cameras, explicitly modeling photometric discrepancies and calibration errors. VisionNVS achieves improved geometric fidelity and visual quality compared to LiDAR-dependent baselines, demonstrating a robust solution for scalable driving simulation.

Key Contribution

By cleverly turning novel view synthesis into a self-supervised inpainting problem, VisionNVS eliminates the need for ground truth images of novel views, outperforming LiDAR-dependent baselines.

Abstract

A fundamental bottleneck in Novel View Synthesis (NVS) for autonomous driving is the inherent supervision gap on novel trajectories: models are tasked with synthesizing unseen views during inference, yet lack ground truth images for these shifted poses during training. In this paper, we propose VisionNVS, a camera-only framework that fundamentally reformulates view synthesis from an ill-posed extrapolation problem into a self-supervised inpainting task. By introducing a ``Virtual-Shift'' strategy, we use monocular depth proxies to simulate occlusion patterns and map them onto the original view. This paradigm shift allows the use of raw, recorded images as pixel-perfect supervision, effectively eliminating the domain gap inherent in previous approaches. Furthermore, we address spatial consistency through a Pseudo-3D Seam Synthesis strategy, which integrates visual data from adjacent cameras during training to explicitly model real-world photometric discrepancies and calibration errors. Experiments demonstrate that VisionNVS achieves superior geometric fidelity and visual quality compared to LiDAR-dependent baselines, offering a robust solution for scalable driving simulation.

Computer Vision Robotics & Embodied AI World Models & Planning

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

VisionNVS: Self-Supervised Inpainting for Novel View Synthesis under the Virtual-Shift Paradigm

Related Papers