Search papers, labs, and topics across Lattice.
DexMulti tackles dexterous manipulation by decomposing demonstrations into object-centric skills with temporal boundaries. It uses a retrieve-align-execute paradigm, retrieving demonstrated skills based on object geometry, aligning them to the observed object state via an uncertainty-aware estimator, and executing them. Experiments on three multi-stage tasks with two dexterous hands show DexMulti achieves a 66% success rate on training objects with limited demonstrations, outperforming diffusion policy baselines and generalizing to unseen objects and spatial variations.
Forget end-to-end training: DexMulti's "retrieve-align-execute" approach lets robots master complex, multi-stage dexterous tasks from just a handful of demonstrations.
Dexterous hands enable concurrent prehensile and nonprehensile manipulation, such as holding one object while interacting with another, a capability essential for everyday tasks yet underexplored in robotics. Learning such long-horizon, contact-rich multi-stage behaviors is challenging because demonstrations are expensive to collect and end-to-end policies require substantial data to generalize across varied object geometries and placements. We present DexMulti, a sample-efficient approach for real-world dexterous multi-task manipulation that decomposes demonstrations into object-centric skills with well-defined temporal boundaries. Rather than learning monolithic policies, our method retrieves demonstrated skills based on current object geometry, aligns them to the observed object state using an uncertainty-aware estimator that tracks centroid and yaw, and executes them via a retrieve-align-execute paradigm. We evaluate on three multi-stage tasks requiring concurrent manipulation (Grasp + Pull, Grasp + Open, and Grasp + Grasp) across two dexterous hands (Allegro and LEAP) in over 1,000 real-world trials. Our approach achieves an average success rate of 66% on training objects with only 3-4 demonstrations per object, outperforming diffusion policy baselines by 2-3x while requiring far fewer demonstrations. Results demonstrate robust generalization to held-out objects and spatial variations up to +/-25 cm.