Search papers, labs, and topics across Lattice.
This paper introduces UMI-Bench 1.0, a novel benchmark specifically designed for the real-world evaluation of Universal Manipulation Interface (UMI)-style robotic manipulation policies. It addresses the critical gap in existing benchmarks by aligning data collection, scene reset, policy execution, result logging, and task-factor analysis within a unified protocol tailored for UMI data-to-deployment scenarios. The key finding is that UMI-Bench enables reproducible and auditable evaluations, thereby enhancing the understanding of how UMI-trained policies perform in real physical environments.
UMI-Bench 1.0 reveals that standardized, reproducible evaluations can significantly improve our understanding of UMI-style manipulation policies in real-world settings.
Real-robot evaluation is essential for understanding whether learned manipulation policies can operate reliably outside curated demonstrations. This need is particularly pressing for Universal Manipulation Interface (UMI)-style policies, whose performance depends on the coupling between wrist-view observations, action representation, data collection, and physical deployment. Existing real-world benchmarks have made important progress, but they are not designed around this UMI data-to-deployment setting. We present UMI-Bench 1.0, a local-first real-robot benchmark for standardized evaluation of UMI-style manipulation policies. To the best of our knowledge, this is the first benchmark dedicated to real-world evaluation of UMI-based manipulation models. UMI-Bench aligns data collection, scene reset, policy execution, result logging, and task-factor analysis within a unified protocol. By making the full evaluation process reproducible and auditable, UMI-Bench provides a practical testbed for measuring how UMI-trained policies generalize to real physical manipulation.