HITHuaweiApr 1, 2026arXiv:2604.00548

Reliev3R: Relieving Feed-forward Reconstruction from Multi-View Geometric Annotations

Youyu Chen, Yueru Luo, Kui Jiang, Xianming Liu, Xu Yan, Dave Zhenyu Chen

AI Summary

This paper introduces Reliev3R, a weakly-supervised training paradigm for Feed-Forward Reconstruction Models (FFRMs) that eliminates the need for multi-view geometric annotations like 3D point maps and camera poses. Reliev3R leverages monocular relative depths and image sparse correspondences from zero-shot predictions of pretrained models to provide 3D knowledge. The method employs an ambiguity-aware relative depth loss and a trigonometry-based reprojection loss to enforce multi-view geometric consistency, achieving performance comparable to fully-supervised FFRMs while using less data.

Key Contribution

Ditch the expensive 3D annotations: Reliev3R trains high-quality 3D reconstruction models from scratch using only monocular relative depths and sparse image correspondences.

Abstract

With recent advances, Feed-forward Reconstruction Models (FFRMs) have demonstrated great potential in reconstruction quality and adaptiveness to multiple downstream tasks. However, the excessive reliance on multi-view geometric annotations, e.g. 3D point maps and camera poses, makes the fully-supervised training scheme of FFRMs difficult to scale up. In this paper, we propose Reliev3R, a weakly-supervised paradigm for training FFRMs from scratch without cost-prohibitive multi-view geometric annotations. Relieving the reliance on geometric sensory data and compute-exhaustive structure-from-motion preprocessing, our method draws 3D knowledge directly from monocular relative depths and image sparse correspondences given by zero-shot predictions of pretrained models. At the core of Reliev3R, we design an ambiguity-aware relative depth loss and a trigonometry-based reprojection loss to facilitate supervision for multi-view geometric consistency. Training from scratch with the less data, Reliev3R catches up with its fully-supervised sibling models, taking a step towards low-cost 3D reconstruction supervisions and scalable FFRMs.

Computer Vision Training Efficiency & Optimization

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Reliev3R: Relieving Feed-forward Reconstruction from Multi-View Geometric Annotations

Related Papers