Jun 10, 2026arXiv:2606.11563

Cross-Modal Benchmarking for Robotic Perception in Natural Environments

David Hall, Joshua Knights, Mark Cox, P. Moghadam, Peyman Moghadam

AI Summary

This paper analyzes the limitations of existing vision foundation models in robotic perception within natural environments, highlighting their performance gaps when applied to field robotics tasks. Utilizing the newly introduced WildCross benchmark, which includes over 476K RGB frames with detailed depth and surface normal annotations, the authors conduct extensive experiments focusing on metric depth estimation. The findings reveal significant deficiencies in current models, underscoring the need for improved training methodologies that account for the complexities of natural settings.

Key Contribution

Current vision models falter in natural environments, with the WildCross benchmark revealing critical gaps in depth estimation capabilities.

Abstract

Natural environments present a complex challenge to robotics perception systems. Current models, particularly vision foundation models, are largely trained on structured, urban environments leading to weaknesses in their perception for field robotics tasks. We showcase the limitations of current models using our recently released WildCross benchmark, a new cross-modal benchmark for place recognition and metric depth estimation in large-scale natural environments. WildCross comprises over 476K sequential RGB frames with semi-dense depth and surface normal annotations, each aligned with accurate 6DoF pose and synchronized dense lidar submaps. In this work, we provide an expanded analysis of the benchmark results from the recent WildCross benchmark, with particular emphasis on expanded metric depth estimation experiments. Access to the code repository and dataset for this work can be found at https://csiro-robotics.github.io/WildCross.

Eval Frameworks & Benchmarks Multimodal Models Robotics & Embodied AI

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Cross-Modal Benchmarking for Robotic Perception in Natural Environments

Related Papers