Search papers, labs, and topics across Lattice.
The authors introduce SHOW3D, a novel marker-less multi-camera system for capturing hand-object interactions in diverse, unconstrained real-world environments. This system uses a back-mounted multi-camera rig synchronized with a VR headset to enable ego-exo tracking for precise 3D annotation of hands and objects. The resulting large-scale dataset, SHOW3D, bridges the gap between environmental realism and annotation accuracy, demonstrating improved generalization in downstream tasks compared to models trained on studio-captured data.
Training data no longer needs to choose between realism and accuracy: SHOW3D delivers both for hand-object interaction.
Accurate 3D understanding of human hands and objects during manipulation remains a significant challenge for egocentric computer vision. Existing hand-object interaction datasets are predominantly captured in controlled studio settings, which limits both environmental diversity and the ability of models trained on such data to generalize to real-world scenarios. To address this challenge, we introduce a novel marker-less multi-camera system that allows for nearly unconstrained mobility in genuinely in-the-wild conditions, while still having the ability to generate precise 3D annotations of hands and objects. The capture system consists of a lightweight, back-mounted, multi-camera rig that is synchronized and calibrated with a user-worn VR headset. For 3D ground-truth annotation of hands and objects, we develop an ego-exo tracking pipeline and rigorously evaluate its quality. Finally, we present SHOW3D, the first large-scale dataset with 3D annotations that show hands interacting with objects in diverse real-world environments, including outdoor settings. Our approach significantly reduces the fundamental trade-off between environmental realism and accuracy of 3D annotations, which we validate with experiments on several downstream tasks. show3d-dataset.github.io