Search papers, labs, and topics across Lattice.
The authors introduce ENIGMA-360, a new dataset of synchronized egocentric and exocentric videos captured in a real industrial setting to facilitate research on human behavior understanding. The dataset comprises 360 videos with temporal and spatial annotations, enabling the study of temporal action segmentation, keystep recognition, and egocentric human-object interaction detection. Baseline experiments demonstrate the limitations of existing state-of-the-art methods on this challenging dataset, highlighting the need for more robust ego-exo understanding models.
Current methods struggle to understand human behavior in industrial settings, as evidenced by the challenging ENIGMA-360 dataset of synchronized ego-exo videos.
Understanding human behavior from complementary egocentric (ego) and exocentric (exo) points of view enables the development of systems that can support workers in industrial environments and enhance their safety. However, progress in this area is hindered by the lack of datasets capturing both views in realistic industrial scenarios. To address this gap, we propose ENIGMA-360, a new ego-exo dataset acquired in a real industrial scenario. The dataset is composed of 180 egocentric and 180 exocentric procedural videos temporally synchronized offering complementary information of the same scene. The 360 videos have been labeled with temporal and spatial annotations, enabling the study of different aspects of human behavior in industrial domain. We provide baseline experiments for 3 foundational tasks for human behavior understanding: 1) Temporal Action Segmentation, 2) Keystep Recognition and 3) Egocentric Human-Object Interaction Detection, showing the limits of state-of-the-art approaches on this challenging scenario. These results highlight the need for new models capable of robust ego-exo understanding in real-world environments. We publicly release the dataset and its annotations at https://iplab.dmi.unict.it/ENIGMA-360.