Search papers, labs, and topics across Lattice.
The paper introduces Dex4D, a framework for learning task-agnostic dexterous manipulation skills in simulation that can be transferred to real-world tasks without fine-tuning. Dex4D trains a 3D point track conditioned policy to manipulate objects to desired poses across thousands of simulated objects and pose configurations. The key result is the demonstration of zero-shot transfer to diverse real-world manipulation tasks, achieving consistent improvements over baselines and strong generalization to novel objects and environments.
Skip task-specific reward engineering: Dex4D learns a single, generalizable dexterous manipulation policy in simulation that transfers zero-shot to real-world tasks by tracking object-centric points.
Learning generalist policies capable of accomplishing a plethora of everyday tasks remains an open challenge in dexterous manipulation. In particular, collecting large-scale manipulation data via real-world teleoperation is expensive and difficult to scale. While learning in simulation provides a feasible alternative, designing multiple task-specific environments and rewards for training is similarly challenging. We propose Dex4D, a framework that instead leverages simulation for learning task-agnostic dexterous skills that can be flexibly recomposed to perform diverse real-world manipulation tasks. Specifically, Dex4D learns a domain-agnostic 3D point track conditioned policy capable of manipulating any object to any desired pose. We train this 'Anypose-to-Anypose' policy in simulation across thousands of objects with diverse pose configurations, covering a broad space of robot-object interactions that can be composed at test time. At deployment, this policy can be zero-shot transferred to real-world tasks without finetuning, simply by prompting it with desired object-centric point tracks extracted from generated videos. During execution, Dex4D uses online point tracking for closed-loop perception and control. Extensive experiments in simulation and on real robots show that our method enables zero-shot deployment for diverse dexterous manipulation tasks and yields consistent improvements over prior baselines. Furthermore, we demonstrate strong generalization to novel objects, scene layouts, backgrounds, and trajectories, highlighting the robustness and scalability of the proposed framework.