R reconstructs large-scaleZJUApr 9, 2026arXiv:2604.08542

Scal3R: Scalable Test-Time Training for Large-Scale 3D Reconstruction

Tao Xie, Tao Xie, Peishan Yang, Peishan Yang, Yudong Jin, Yingfeng Cai, Yingfeng Cai, Wei Yin, Wei Yin, Weiqiang Ren, Weiqiang Ren, Qian Zhang, Wei Hua, Wei Hua, Sida Peng, Sida Peng, Xiaoyang Guo

AI Summary

The paper introduces Scal3R, a test-time training approach for large-scale 3D reconstruction from long video sequences. Scal3R employs lightweight neural sub-networks to compress and retain long-range scene information, enabling the model to leverage extensive contextual cues for enhanced reconstruction accuracy and consistency. Experiments on KITTI Odometry and Oxford Spires datasets demonstrate that Scal3R achieves state-of-the-art 3D reconstruction accuracy and pose accuracy while maintaining efficiency in ultra-large scenes.

Key Contribution

Achieve state-of-the-art 3D reconstruction in large-scale scenes by rapidly adapting lightweight neural networks during test time to capture global context.

Abstract

This paper addresses the task of large-scale 3D scene reconstruction from long video sequences. Recent feed-forward reconstruction models have shown promising results by directly regressing 3D geometry from RGB images without explicit 3D priors or geometric constraints. However, these methods often struggle to maintain reconstruction accuracy and consistency over long sequences due to limited memory capacity and the inability to effectively capture global contextual cues. In contrast, humans can naturally exploit the global understanding of the scene to inform local perception. Motivated by this, we propose a novel neural global context representation that efficiently compresses and retains long-range scene information, enabling the model to leverage extensive contextual cues for enhanced reconstruction accuracy and consistency. The context representation is realized through a set of lightweight neural sub-networks that are rapidly adapted during test time via self-supervised objectives, which substantially increases memory capacity without incurring significant computational overhead. The experiments on multiple large-scale benchmarks, including the KITTI Odometry~\cite{Geiger2012CVPR} and Oxford Spires~\cite{tao2025spires} datasets, demonstrate the effectiveness of our approach in handling ultra-large scenes, achieving leading pose accuracy and state-of-the-art 3D reconstruction accuracy while maintaining efficiency. Code is available at https://zju3dv.github.io/scal3r.

Computer Vision Robotics & Embodied AI Training Efficiency & Optimization

Citation Metrics

Citations0

Influential citations0

References101

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Scal3R: Scalable Test-Time Training for Large-Scale 3D Reconstruction

Related Papers