Mar 29, 2026arXiv:2603.27797

Which Reconstruction Model Should a Robot Use? Routing Image-to-3D Models for Cost-Aware Robotic Manipulation

Akash Anand, Aditya Agarwal, Leslie Pack Kaelbling

AI Summary

This paper introduces SCOUT, a routing framework for robots that selects the optimal 3D reconstruction model based on cost and quality tradeoffs. SCOUT decomposes reconstruction scores into viewpoint-dependent model performance (learned probability distribution) and overall image difficulty (scalar partition function estimate). Experiments on object datasets and robotic manipulation tasks show SCOUT outperforms routing baselines while accommodating cost constraints.

Key Contribution

Robots can now intelligently choose the best 3D reconstruction model for a task, balancing accuracy and computational cost without needing to retrain when new reconstruction methods are added.

Abstract

Robotic manipulation tasks require 3D mesh reconstructions of varying quality: dexterous manipulation demands fine-grained surface detail, while collision-free planning tolerates coarser representations. Multiple reconstruction methods offer different cost-quality tradeoffs, from Image-to-3D models - whose output quality depends heavily on the input viewpoint - to view-invariant methods such as structured light scanning. Querying all models is computationally prohibitive, motivating per-input model selection. We propose SCOUT, a novel routing framework that decouples reconstruction scores into two components: (1) the relative performance of viewpoint-dependent models, captured by a learned probability distribution, and (2) the overall image difficulty, captured by a scalar partition function estimate. As the learned network operates only over the viewpoint-dependent models, view-invariant pipelines can be added, removed, or reconfigured without retraining. SCOUT also supports arbitrary cost constraints at inference time, accommodating the multi-dimensional cost constraints common in robotics. We evaluate on the Google Scanned Objects, BigBIRD, and YCB datasets under multiple mesh quality metrics, demonstrating consistent improvements over routing baselines adapted from the LLM literature across various cost constraints. We further validate the framework through robotic grasping and dexterous manipulation experiments. We release the code and additional results on our website.

Computer Vision Robotics & Embodied AI World Models & Planning

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Which Reconstruction Model Should a Robot Use? Routing Image-to-3D Models for Cost-Aware Robotic Manipulation

Related Papers