Apr 15, 2026arXiv:2604.14141

Geometric Context Transformer for Streaming 3D Reconstruction

Lin-Zhuo Chen, Jian Gao, Yihang Chen, Ka Leong Cheng, Yipengjing Sun, Liangxiao Hu, Nan Xue, Xing Zhu, Yujun Shen, Yao Yao, Yinghao Xu

AI Summary

The paper introduces LingBot-Map, a feed-forward 3D foundation model for streaming 3D reconstruction, built upon a novel geometric context transformer (GCT) architecture. The GCT incorporates an attention mechanism with anchor context, pose-reference window, and trajectory memory for coordinate grounding, dense geometric cues, and long-range drift correction. Experiments show LingBot-Map achieves state-of-the-art performance on various benchmarks while maintaining real-time inference speeds (20 FPS) on long sequences.

Key Contribution

Real-time 3D scene reconstruction from streaming video is now possible with a feed-forward transformer that outperforms traditional SLAM methods.

Abstract

Streaming 3D reconstruction aims to recover 3D information, such as camera poses and point clouds, from a video stream, which necessitates geometric accuracy, temporal consistency, and computational efficiency. Motivated by the principles of Simultaneous Localization and Mapping (SLAM), we introduce LingBot-Map, a feed-forward 3D foundation model for reconstructing scenes from streaming data, built upon a geometric context transformer (GCT) architecture. A defining aspect of LingBot-Map lies in its carefully designed attention mechanism, which integrates an anchor context, a pose-reference window, and a trajectory memory to address coordinate grounding, dense geometric cues, and long-range drift correction, respectively. This design keeps the streaming state compact while retaining rich geometric context, enabling stable efficient inference at around 20 FPS on 518 x 378 resolution inputs over long sequences exceeding 10,000 frames. Extensive evaluations across a variety of benchmarks demonstrate that our approach achieves superior performance compared to both existing streaming and iterative optimization-based approaches.

Architecture Design (Transformers, SSMs, MoE)Computer Vision Robotics & Embodied AI

Citation Metrics

Citations0

Influential citations0

References98

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Geometric Context Transformer for Streaming 3D Reconstruction

Related Papers