Search papers, labs, and topics across Lattice.
The paper introduces Trinity, a transformer-based architecture for joint class-specific semantic segmentation and class-agnostic terrain segmentation in unstructured outdoor environments. By segmenting terrain based on visual appearance without predefined semantic labels or robot-dependent traversability scores, Trinity learns robot-agnostic visual terrain priors. The approach is trained on a new synthetic dataset, RUGDSynth, and evaluated on a new real-world dataset, EXTerra, demonstrating the effectiveness of the joint segmentation approach.
Ditch robot-specific terrain labels: Trinity learns generalizable terrain understanding from visual appearance alone, unlocking transferability across diverse robotic platforms.
Terrain understanding is fundamental for mobile robots operating in unstructured outdoor environments. Existing vision-based traversability estimation methods rely on robot-specific annotations or semantic class mappings, limiting transferability across platforms and requiring costly re-annotation when robot capabilities change, while standard semantic segmentation methods only focus on specific predefined classes, which do not capture the variety of terrains. In this work, we propose a transformer-based architecture that jointly performs class-specific semantic segmentation and class-agnostic terrain segmentation within a unified network, called Trinity. Terrain regions are segmented based solely on visual appearance, without predefined semantic labels or robot-dependent traversability scores. This formulation enables the learning of robot-agnostic visual terrain priors that can be combined with robot-specific experience for downstream tasks such as traversability estimation, visual odometry, and mission planning. To enable large-scale training with diverse terrain appearances, we extend the OAISYS simulator and introduce RUGDSynth, a synthetic dataset inspired by RUGD with class-agnostic terrain samples. Furthermore, we present the EXTerra Dataset, providing real-world images annotated with both class-specific and class-agnostic terrain labels. Experiments demonstrate the feasibility of the proposed task and the effectiveness of our joint segmentation approach in complex outdoor environments. Code and datasets will be released with this publication (after review).