Google ResearchSJTUUSCMar 12, 2026arXiv:2603.11655

Concurrent Prehensile and Nonprehensile Manipulation: A Practical Approach to Multi-Stage Dexterous Tasks

Haolai Jiang, Hao Jiang, Yue Wu, Gaurav Sukhatme, Daniel Seita

AI Summary

DexMulti tackles dexterous manipulation by decomposing demonstrations into object-centric skills with temporal boundaries. It uses a retrieve-align-execute paradigm, retrieving demonstrated skills based on object geometry, aligning them to the observed object state via an uncertainty-aware estimator, and executing them. Experiments on three multi-stage tasks with two dexterous hands show DexMulti achieves a 66% success rate on training objects with limited demonstrations, outperforming diffusion policy baselines and generalizing to unseen objects and spatial variations.

Key Contribution

Forget end-to-end training: DexMulti's "retrieve-align-execute" approach lets robots master complex, multi-stage dexterous tasks from just a handful of demonstrations.

Abstract

Dexterous hands enable concurrent prehensile and nonprehensile manipulation, such as holding one object while interacting with another, a capability essential for everyday tasks yet underexplored in robotics. Learning such long-horizon, contact-rich multi-stage behaviors is challenging because demonstrations are expensive to collect and end-to-end policies require substantial data to generalize across varied object geometries and placements. We present DexMulti, a sample-efficient approach for real-world dexterous multi-task manipulation that decomposes demonstrations into object-centric skills with well-defined temporal boundaries. Rather than learning monolithic policies, our method retrieves demonstrated skills based on current object geometry, aligns them to the observed object state using an uncertainty-aware estimator that tracks centroid and yaw, and executes them via a retrieve-align-execute paradigm. We evaluate on three multi-stage tasks requiring concurrent manipulation (Grasp + Pull, Grasp + Open, and Grasp + Grasp) across two dexterous hands (Allegro and LEAP) in over 1,000 real-world trials. Our approach achieves an average success rate of 66% on training objects with only 3-4 demonstrations per object, outperforming diffusion policy baselines by 2-3x while requiring far fewer demonstrations. Results demonstrate robust generalization to held-out objects and spatial variations up to +/-25 cm.

Robotics & Embodied AI Training Efficiency & Optimization World Models & Planning

Citation Metrics

Citations0

Influential citations0

References47

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Concurrent Prehensile and Nonprehensile Manipulation: A Practical Approach to Multi-Stage Dexterous Tasks

Related Papers