UCSDMar 2, 2026arXiv:2603.01436

PhysGraph: Physically-Grounded Graph-Transformer Policies for Bimanual Dexterous Hand-Tool-Object Manipulation

R. Li, Runfa Blark Li, David Kim, David Kim, Xinshuang Liu, Xinshuang Liu, Keito Suzuki, Keito Suzuki, Dwait Bhatt, Dwait Bhatt, Nikola Raicevic, Nikola Raicevic, Xinzhuo Lin, Xin Lin, Ki Myung Brian Lee, Ki Myung Brian Lee, Nikolay Atanasov, Nikolay Atanasov, Truong Nguyen

AI Summary

The paper introduces PhysGraph, a graph transformer policy for bimanual dexterous manipulation that represents the system as a kinematic graph with per-link tokenization. It incorporates a physically-grounded bias generator to inject structural priors, such as kinematic distance and contact states, into the attention mechanism. Experiments demonstrate that PhysGraph outperforms ManipTrans in manipulation precision and task success rates with fewer parameters, and exhibits zero-shot transfer to unseen geometries and generalization across different robotic hands.

Key Contribution

By encoding physical priors into a graph transformer, PhysGraph achieves superior bimanual manipulation with half the parameters and zero-shot generalization to new tools, suggesting learned physics can be replaced with structured inductive biases.

Abstract

Bimanual dexterous manipulation for tool use remains a formidable challenge in robotics due to the high-dimensional state space and complicated contact dynamics. Existing methods naively represent the entire system state as a single configuration vector, disregarding the rich structural and topological information inherent to articulated hands. We present PhysGraph, a physically-grounded graph transformer policy designed explicitly for challenging bimanual hand-tool-object manipulation. Unlike prior works, we represent the bimanual system as a kinematic graph and introduce per-link tokenization to preserve fine-grained local state information. We propose a physically-grounded bias generator that injects structural priors directly into the attention mechanism, including kinematic spatial distance, dynamic contact states, geometric proximity, and anatomical properties. This allows the policy to explicitly reason about physical interactions rather than learning them implicitly from sparse rewards. Extensive experiments show that PhysGraph significantly outperforms baseline - ManipTrans in manipulation precision and task success rates while using only 51% of the parameters of ManipTrans. Furthermore, the inherent topological flexibility of our architecture shows qualitative zero-shot transfer to unseen tool/object geometries, and is sufficiently general to be trained on three robotic hands (Shadow, Allegro, Inspire).

Architecture Design (Transformers, SSMs, MoE)Robotics & Embodied AI Tool Use & Agents

Citation Metrics

Citations0

Influential citations0

References25

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

PhysGraph: Physically-Grounded Graph-Transformer Policies for Bimanual Dexterous Hand-Tool-Object Manipulation

Related Papers