NVIDIAAdobe ResearchMar 4, 2026arXiv:2603.03744

DAGE: Dual-Stream Architecture for Efficient and Fine-Grained Geometry Estimation

Tuan Duc Ngo, Jiahui Huang, Seoung Wug Oh, Kevin Blackburn-Matzen, Evangelos Kalogerakis, Chuang Gan, Joon-Young Lee

AI Summary

DAGE, a dual-stream transformer architecture, is introduced to address the challenges of high-resolution, long-sequence geometry and camera pose estimation from multi-view inputs. The architecture uses a low-resolution stream with frame/global attention for view consistency and camera estimation, and a high-resolution stream to preserve fine details. By fusing these streams with cross-attention, DAGE achieves state-of-the-art results in video geometry estimation and multi-view reconstruction while scaling effectively to 2K inputs.

Key Contribution

Achieve state-of-the-art results in high-resolution video geometry estimation by disentangling global coherence and fine detail using a dual-stream transformer architecture.

Abstract

Estimating accurate, view-consistent geometry and camera poses from uncalibrated multi-view/video inputs remains challenging - especially at high spatial resolutions and over long sequences. We present DAGE, a dual-stream transformer whose main novelty is to disentangle global coherence from fine detail. A low-resolution stream operates on aggressively downsampled frames with alternating frame/global attention to build a view-consistent representation and estimate cameras efficiently, while a high-resolution stream processes the original images per-frame to preserve sharp boundaries and small structures. A lightweight adapter fuses these streams via cross-attention, injecting global context without disturbing the pretrained single-frame pathway. This design scales resolution and clip length independently, supports inputs up to 2K, and maintains practical inference cost. DAGE delivers sharp depth/pointmaps, strong cross-view consistency, and accurate poses, establishing new state-of-the-art results for video geometry estimation and multi-view reconstruction.

Architecture Design (Transformers, SSMs, MoE)Computer Vision Training Efficiency & Optimization

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

DAGE: Dual-Stream Architecture for Efficient and Fine-Grained Geometry Estimation

Related Papers