Search papers, labs, and topics across Lattice.
InterMesh explicitly models human-environment interactions within an end-to-end multi-person human mesh recovery pipeline by incorporating structured interaction semantics from a human-object interaction detector. This enriched representation is then integrated into existing HMR architectures using a Contextual Interaction Encoder and Interaction-Guided Refiner. Experiments across multiple datasets (3DPW, MuPoTS, CMU Panoptic, Hi4D, and CHI3D) demonstrate significant improvements, including a 9.9% MPJPE reduction on CMU Panoptic and 8.2% on Hi4D.
Explicitly modeling human-object interactions boosts multi-person human mesh recovery accuracy by up to 9.9%, showing that interaction context is key to understanding human pose and shape in complex scenes.
Humans constantly interact with their surroundings. Existing end-to-end multi-person human mesh recovery methods, typically based on the DETR framework, capture inter-human relationships through self-attention across all human queries. However, these approaches model interactions only implicitly and lack explicit reasoning about how humans interact with objects and with each other. In this paper, we propose InterMesh, a simple yet effective framework that explicitly incorporates human-environment interaction information into human mesh recovery pipeline. By leveraging a human-object interaction detector, InterMesh enriches query representations with structured interaction semantics, enabling more accurate pose and shape estimation. We design lightweight modules, Contextual Interaction Encoder and Interaction-Guided Refiner, to integrate these features into existing HMR architectures with minimal overhead. We validate our approach through extensive experiments on 3DPW, MuPoTS, CMU Panoptic, Hi4D, and CHI3D datasets, demonstrating remarkable improvements over state-of-the-art methods. Notably, InterMesh reduces MPJPE by 9.9% on CMU Panoptic and 8.2% on Hi4D, highlighting its effectiveness in scenarios with complex human-object and inter-human interactions.