Snap ResearchSpecs Inc.UIUCMar 30, 2026arXiv:2603.28766

HandX: Scaling Bimanual Motion and Interaction Generation

Zimu Zhang, Zimu Zhang, Yucheng Zhang, Yuchen Zhang, Xiyan Xu, Xiyan Xu, Ziyin Wang, Ziyin Wang, Sirui Xu, Sirui Xu, Kaimao Zhou, Kai Zhou, Bing Zhou, Chuan Guo, Chuan Guo, Jian Wang, Yu-Xiong Wang, Yu-Xiong Wang, Liang-Yan Gui, Liangxin Gui

AI Summary

The paper introduces HandX, a new dataset and benchmark for bimanual motion and interaction generation, addressing the limitations of existing datasets in capturing fine-grained finger dynamics and inter-hand coordination. They consolidate existing datasets, collect a new motion-capture dataset, and introduce a decoupled annotation strategy using LLMs to generate semantically rich descriptions. Experiments with diffusion and autoregressive models on HandX demonstrate high-quality dexterous motion generation and reveal that performance scales with model and dataset size.

Key Contribution

LLMs can scalably annotate motion capture data to produce semantically rich descriptions of bimanual interactions, enabling higher-quality generation of dexterous hand motions.

Abstract

Synthesizing human motion has advanced rapidly, yet realistic hand motion and bimanual interaction remain underexplored. Whole-body models often miss the fine-grained cues that drive dexterous behavior, finger articulation, contact timing, and inter-hand coordination, and existing resources lack high-fidelity bimanual sequences that capture nuanced finger dynamics and collaboration. To fill this gap, we present HandX, a unified foundation spanning data, annotation, and evaluation. We consolidate and filter existing datasets for quality, and collect a new motion-capture dataset targeting underrepresented bimanual interactions with detailed finger dynamics. For scalable annotation, we introduce a decoupled strategy that extracts representative motion features, e.g., contact events and finger flexion, and then leverages reasoning from large language models to produce fine-grained, semantically rich descriptions aligned with these features. Building on the resulting data and annotations, we benchmark diffusion and autoregressive models with versatile conditioning modes. Experiments demonstrate high-quality dexterous motion generation, supported by our newly proposed hand-focused metrics. We further observe clear scaling trends: larger models trained on larger, higher-quality datasets produce more semantically coherent bimanual motion. Our dataset is released to support future research.

Computer Vision Data Curation & Synthetic Data Robotics & Embodied AI

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

HandX: Scaling Bimanual Motion and Interaction Generation

Related Papers