NVIDIACMU RIKITJun 11, 2026arXiv:2606.13497

SPARC: Reliable Spatial Annotations from Robot Demonstrations at Scale

Nils Blank, Paul Mattes, Maximilian Xiling Li, Jakub Suliga, Thomas Roth, T. Roth, Moritz Reuss, Pankhuri Vanjani, Rudolf Lioutikov

AI Summary

The SPARC framework introduces a novel approach to automatically label robot demonstrations with structured spatial annotations while providing a reliability score for each annotation. By leveraging the spatio-temporal structure of robot tasks, SPARC effectively reduces the prevalence of noisy labels, allowing for the retention of more useful samples compared to existing automated pipelines. Evaluated on 1.7k human-annotated demonstrations, SPARC not only enhances localization accuracy but also enables state-of-the-art performance in object-grounding tasks, demonstrating its practical utility in complex real-world scenarios.

Key Contribution

SPARC reduces noisy labels by leveraging task structure, enabling robots to learn from more reliable demonstrations and outperforming traditional methods in real-world applications.

Abstract

This work introduces Spatial Annotations from Robot Demonstrations with Reliability Calibration (SPARC), a risk-aware framework that automatically labels robot demonstrations with structured spatial annotations and assigns each annotation a reliability score. Structured spatial annotations, such as bounding boxes, object trajectories, and manipulation phase labels, benefit a broad range of robotics applications from training grounded robot policies and embodied foundation models to motion planning and hierarchical task composition. Existing automated pipelines generate such annotations at scale but provide no reliable quality signal: detector confidence is poorly calibrated for annotation correctness, forcing a choice between accepting noisy labels or discarding useful samples. In contrast to existing automated pipelines, SPARC leverages the spatio-temporal structure inherent to robot tasks to generate a reliability signal, reducing noisy labels and retaining more useful samples. We further introduce Interaction-Aware Bench (IA-Bench), a benchmark that measures model accuracy in grounding the locations of interacted objects in robot demonstrations. On 1.7k human-annotated demonstrations spanning diverse embodiments and scenarios, SPARC significantly outperforms detection-only baselines in localization accuracy while retaining three times more samples at high-precision operating points. Our experiments demonstrate that models finetuned on our annotations achieve state-of-the-art results on object-grounding and pointing benchmarks among similarly sized models, while remaining competitive on broader spatial-reasoning suites without manually verified or annotated training data. Furthermore, policies trained on SPARC-generated annotations outperform baselines in cluttered, visually ambiguous real-world scenes. Code, data, and models are available at intuitive-robots.github.io/sparc-labeling.

Multimodal Models Robotics & Embodied AI

Citation Metrics

Citations0

Influential citations0

References68

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

SPARC: Reliable Spatial Annotations from Robot Demonstrations at Scale

Related Papers