Mar 11, 2026arXiv:2603.10438

AsyncMDE: Real-Time Monocular Depth Estimation via Asynchronous Spatial Memory

Lianjie Ma, Yuquan Li, Bi-Ye Jiang, Z. Zhong, Han Ding, Lijun Zhu

AI Summary

AsyncMDE introduces an asynchronous depth perception system that combines a foundation model for high-quality spatial features with a lightweight model for real-time depth estimation. The lightweight model fuses cached memory from the foundation model with current observations, amortizing the computational cost of the foundation model over time. AsyncMDE achieves 237 FPS on an RTX 4090 and 161 FPS on a Jetson AGX Orin, recovering 77% of the accuracy gap to the foundation model with a 25X parameter reduction.

Key Contribution

Monocular depth estimation can now run at 161 FPS on edge devices without sacrificing too much accuracy, thanks to a clever asynchronous architecture that reuses features from a foundation model.

Abstract

Foundation-model-based monocular depth estimation offers a viable alternative to active sensors for robot perception, yet its computational cost often prohibits deployment on edge platforms. Existing methods perform independent per-frame inference, wasting the substantial computational redundancy between adjacent viewpoints in continuous robot operation. This paper presents AsyncMDE, an asynchronous depth perception system consisting of a foundation model and a lightweight model that amortizes the foundation model's computational cost over time. The foundation model produces high-quality spatial features in the background, while the lightweight model runs asynchronously in the foreground, fusing cached memory with current observations through complementary fusion, outputting depth estimates, and autoregressively updating the memory. This enables cross-frame feature reuse with bounded accuracy degradation. At a mere 3.83M parameters, it operates at 237 FPS on an RTX 4090, recovering 77% of the accuracy gap to the foundation model while achieving a 25X parameter reduction. Validated across indoor static, dynamic, and synthetic extreme-motion benchmarks, AsyncMDE degrades gracefully between refreshes and achieves 161FPS on a Jetson AGX Orin with TensorRT, clearly demonstrating its feasibility for real-time edge deployment.

Computer Vision Inference & Quantization Robotics & Embodied AI

Citation Metrics

Citations0

Influential citations0

References37

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

AsyncMDE: Real-Time Monocular Depth Estimation via Asynchronous Spatial Memory

Related Papers