Mar 9, 2026arXiv:2603.08523

BuildMamba: A Visual State-Space Based Model for Multi-Task Building Segmentation and Height Estimation from Satellite Images

Sinan U. Ulu, A. Enes Doruk, I. Can Yagmur, Bahadir K. Gunturk, Oguz Hanoglu, Hasan F. Ates

AI Summary

BuildMamba, a novel multi-task framework, is introduced for building segmentation and height estimation from satellite images, leveraging visual state-space models for efficient global context modeling. The framework incorporates a Mamba Attention Module, a Spatial-Aware Mamba-FPN, and a Mask-Aware Height Refinement module to address limitations of existing monocular depth architectures, such as boundary bleeding and height underestimation. Experiments on three benchmarks demonstrate that BuildMamba achieves state-of-the-art performance, including an RMSE of 1.77m on the DFC23 benchmark, outperforming previous methods by 0.82m in height estimation.

Key Contribution

Forget monocular depth architectures: BuildMamba uses visual state-space models to achieve state-of-the-art building height estimation from satellite imagery, slashing RMSE by nearly a meter.

Abstract

Accurate building segmentation and height estimation from single-view RGB satellite imagery are fundamental for urban analytics, yet remain ill-posed due to structural variability and the high computational cost of global context modeling. While current approaches typically adapt monocular depth architectures, they often suffer from boundary bleeding and systematic underestimation of high-rise structures. To address these limitations, we propose BuildMamba, a unified multi-task framework designed to exploit the linear-time global modeling of visual state-space models. Motivated by the need for stronger structural coupling and computational efficiency, we introduce three modules: a Mamba Attention Module for dynamic spatial recalibration, a Spatial-Aware Mamba-FPN for multi-scale feature aggregation via gated state-space scans, and a Mask-Aware Height Refinement module using semantic priors to suppress height artifacts. Extensive experiments demonstrate that BuildMamba establishes a new performance upper bound across three benchmarks. Specifically, it achieves an IoU of 0.93 and RMSE of 1.77~m on DFC23 benchmark, surpassing state-of-the-art by 0.82~m in height estimation. Simulation results confirm the model's superior robustness and scalability for large-scale 3D urban reconstruction.

Architecture Design (Transformers, SSMs, MoE)Computer Vision Multimodal Models

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

BuildMamba: A Visual State-Space Based Model for Multi-Task Building Segmentation and Height Estimation from Satellite Images

Related Papers