Search papers, labs, and topics across Lattice.
The paper introduces PhysGraph, a graph transformer policy for bimanual dexterous manipulation that represents the system as a kinematic graph with per-link tokenization. It incorporates a physically-grounded bias generator to inject structural priors, such as kinematic distance and contact states, into the attention mechanism. Experiments demonstrate that PhysGraph outperforms ManipTrans in manipulation precision and task success rates with fewer parameters, and exhibits zero-shot transfer to unseen geometries and generalization across different robotic hands.
By encoding physical priors into a graph transformer, PhysGraph achieves superior bimanual manipulation with half the parameters and zero-shot generalization to new tools, suggesting learned physics can be replaced with structured inductive biases.
Bimanual dexterous manipulation for tool use remains a formidable challenge in robotics due to the high-dimensional state space and complicated contact dynamics. Existing methods naively represent the entire system state as a single configuration vector, disregarding the rich structural and topological information inherent to articulated hands. We present PhysGraph, a physically-grounded graph transformer policy designed explicitly for challenging bimanual hand-tool-object manipulation. Unlike prior works, we represent the bimanual system as a kinematic graph and introduce per-link tokenization to preserve fine-grained local state information. We propose a physically-grounded bias generator that injects structural priors directly into the attention mechanism, including kinematic spatial distance, dynamic contact states, geometric proximity, and anatomical properties. This allows the policy to explicitly reason about physical interactions rather than learning them implicitly from sparse rewards. Extensive experiments show that PhysGraph significantly outperforms baseline - ManipTrans in manipulation precision and task success rates while using only 51% of the parameters of ManipTrans. Furthermore, the inherent topological flexibility of our architecture shows qualitative zero-shot transfer to unseen tool/object geometries, and is sufficiently general to be trained on three robotic hands (Shadow, Allegro, Inspire).