Search papers, labs, and topics across Lattice.
This paper trains and evaluates ten deep stereo matching networks on a new dataset of real tree branch images (Canterbury Tree Branches dataset) with disparity maps generated by DEFOM-Stereo as training targets. The study benchmarks these networks using perceptual and structural metrics, identifying BANet-3D as producing the best overall disparity quality and RAFT-Stereo as achieving the highest scene-level understanding. Furthermore, the paper assesses the runtime performance of these networks on an NVIDIA Jetson Orin Super, finding that AnyNet achieves near-real-time performance, providing valuable insights for UAV forestry applications.
You can get near-real-time stereo depth estimation for forestry drones, but you'll have to trade off quality.
Autonomous drone-based tree pruning needs accurate, real-time depth estimation from stereo cameras. Depth is computed from disparity maps using $Z = f B/d$, so even small disparity errors cause noticeable depth mistakes at working distances. Building on our earlier work that identified DEFOM-Stereo as the best reference disparity generator for vegetation scenes, we present the first study to train and test ten deep stereo matching networks on real tree branch images. We use the Canterbury Tree Branches dataset -- 5,313 stereo pairs from a ZED Mini camera at 1080P and 720P -- with DEFOM-generated disparity maps as training targets. The ten methods cover step-by-step refinement, 3D convolution, edge-aware attention, and lightweight designs. Using perceptual metrics (SSIM, LPIPS, ViTScore) and structural metrics (SIFT/ORB feature matching), we find that BANet-3D produces the best overall quality (SSIM = 0.883, LPIPS = 0.157), while RAFT-Stereo scores highest on scene-level understanding (ViTScore = 0.799). Testing on an NVIDIA Jetson Orin Super (16 GB, independently powered) mounted on our drone shows that AnyNet reaches 6.99 FPS at 1080P -- the only near-real-time option -- while BANet-2D gives the best quality-speed balance at 1.21 FPS. We also compare 720P and 1080P processing times to guide resolution choices for forestry drone systems.