Search papers, labs, and topics across Lattice.
This paper introduces IAM, an identity-aware framework for text-driven human motion generation that explicitly models the relationship between body morphology and motion dynamics using multimodal signals like language and visual cues. IAM jointly synthesizes motion sequences and body shape parameters, allowing identity cues to directly influence motion. Experiments on motion capture and in-the-wild videos show IAM improves motion realism and motion-identity consistency compared to identity-neutral approaches.
Human motion generation gets a dose of reality: IAM shows that explicitly modeling body morphology and identity leads to more realistic and consistent movements.
Recent advances in text-driven human motion generation enable models to synthesize realistic motion sequences from natural language descriptions. However, most existing approaches assume identity-neutral motion and generate movements using a canonical body representation, ignoring the strong influence of body morphology on motion dynamics. In practice, attributes such as body proportions, mass distribution, and age significantly affect how actions are performed, and neglecting this coupling often leads to physically inconsistent motions. We propose an identity-aware motion generation framework that explicitly models the relationship between body morphology and motion dynamics. Instead of relying on explicit geometric measurements, identity is represented using multimodal signals, including natural language descriptions and visual cues. We further introduce a joint motion-shape generation paradigm that simultaneously synthesizes motion sequences and body shape parameters, allowing identity cues to directly modulate motion dynamics. Extensive experiments on motion capture datasets and large-scale in-the-wild videos demonstrate improved motion realism and motion-identity consistency while maintaining high motion quality. Project page: https://vjwq.github.io/IAM