Search papers, labs, and topics across Lattice.
The authors introduce BigMaQ, a large-scale dataset of rhesus macaque interactions comprising over 750 scenes with detailed 3D pose descriptions obtained by adapting a high-quality macaque template mesh to individual monkeys. This dataset aims to bridge the gap between image-based animal action recognition and accurate 3D pose and shape reconstruction, which is currently lacking for non-human primates. By pairing image and video encoder features with the 3D pose descriptors, the authors demonstrate significant improvements in mean average precision (mAP) on the derived BigMaQ500 action recognition benchmark.
BigMaQ unlocks richer understanding of primate behavior by linking visual data with detailed 3D pose, boosting action recognition accuracy.
The recognition of dynamic and social behavior in animals is fundamental for advancing ethology, ecology, medicine and neuroscience. Recent progress in deep learning has enabled automated behavior recognition from video, yet an accurate reconstruction of the three-dimensional (3D) pose and shape has not been integrated into this process. Especially for non-human primates, mesh-based tracking efforts lag behind those for other species, leaving pose descriptions restricted to sparse keypoints that are unable to fully capture the richness of action dynamics. To address this gap, we introduce the $\textbf{Big Ma}$ca$\textbf{Q}$ue 3D Motion and Animation Dataset ($\texttt{BigMaQ}$), a large-scale dataset comprising more than 750 scenes of interacting rhesus macaques with detailed 3D pose descriptions. Extending previous surface-based animal tracking methods, we construct subject-specific textured avatars by adapting a high-quality macaque template mesh to individual monkeys. This allows us to provide pose descriptions that are more accurate than previous state-of-the-art surface-based animal tracking methods. From the original dataset, we derive BigMaQ500, an action recognition benchmark that links surface-based pose vectors to single frames across multiple individual monkeys. By pairing features extracted from established image and video encoders with and without our pose descriptors, we demonstrate substantial improvements in mean average precision (mAP) when pose information is included. With these contributions, $\texttt{BigMaQ}$ establishes the first dataset that both integrates dynamic 3D pose-shape representations into the learning task of animal action recognition and provides a rich resource to advance the study of visual appearance, posture, and social interaction in non-human primates. The code and data are publicly available at https://martinivis.github.io/BigMaQ/ .