Search papers, labs, and topics across Lattice.
This paper addresses the computational expense of test-time optimization for zero-shot depth completion by proposing a method that adapts only the decoder of a pre-trained depth foundation model. The key insight is that depth-relevant information is concentrated in a low-dimensional decoder subspace, allowing for efficient adaptation. By updating only this subspace with sparse depth supervision, the method achieves state-of-the-art performance with significantly reduced computational cost, establishing a new accuracy-efficiency trade-off.
Forget full-network finetuning: adapting only a low-dimensional decoder subspace unlocks state-of-the-art zero-shot depth completion with significantly improved efficiency.
Zero-shot depth completion has gained attention for its ability to generalize across environments without sensor-specific datasets or retraining. However, most existing approaches rely on diffusion-based test-time optimization, which is computationally expensive due to iterative denoising. Recent visual-prompt-based methods reduce training cost but still require repeated forward--backward passes through the full frozen network to optimize input-level prompts, resulting in slow inference. In this work, we show that adapting only the decoder is sufficient for effective test-time optimization, as depth foundation models concentrate depth-relevant information within a low-dimensional decoder subspace. Based on this insight, we propose a lightweight test-time adaptation method that updates only this low-dimensional subspace using sparse depth supervision. Our approach achieves state-of-the-art performance, establishing a new Pareto frontier between accuracy and efficiency for test-time adaptation. Extensive experiments on five indoor and outdoor datasets demonstrate consistent improvements over prior methods, highlighting the practicality of fast zero-shot depth completion.