NVIDIASapienzaJun 15, 2026arXiv:2606.17055

T-Rex: Tactile-Reactive Dexterous Manipulation

Dantong Niu, Zhuoyang Liu, Zekai Wang, Boning Shao, Zhao-Heng Yin, Anirudh Pai, Yuvan Sharma, Stefano Saravalle, Ruijie Zheng, Jing Wang, Ryan Punamiya, Mengda Xu, Yuqi Xie, Yunfan Jiang, Letian Fu, Konstantinos Kallidromitis, Matteo Gioia, Junyi Zhang, Jiaxin Ge, Haiwen Feng, Fabio Galasso, Wei Zhan, David M. Chan, Yutong Bai, Roei Herzig, Jiahui Lei, Fei-Fei Li, Ken Goldberg, Jitendra Malik, Pieter Abbeel, Yuke Zhu, Danfei Xu, Jim, Fan, Trevor Darrell

AI Summary

This paper addresses the limitations of current Vision-Language-Action (VLA) models in robotic manipulation by introducing a large-scale tactile-rich dataset and a novel Mixture-of-Transformers architecture that incorporates a temporal tactile VQ-VAE encoder. The authors collected 100 hours of diverse tactile data to enhance the dynamic responsiveness of robots to tactile signals, which has been largely neglected in existing models. Their approach significantly improves performance, achieving over a 30% increase in success rates across 12 manipulation tasks that require fine motor control and handling of deformable objects.

Key Contribution

Tactile-reactive manipulation can boost robotic success rates by over 30% in delicate tasks, revolutionizing how robots interact with their environment.

Abstract

The ability to react dynamically to tactile signals has long been considered crucial to agile human-level dexterity. Yet contemporary learning-based Vision-Language-Action (VLA) models for robotic manipulation generally either overlook the tactile modality or are limited to encoders with static cues, due in part to the scarcity of diverse training data and standardized evaluation, architectural constraints in current VLA models, and limitations of static tactile encoders. In this paper, we push the frontier of tactile-reactive manipulation by addressing all of these limitations. We propose a large-scale, 100-hour tactile-rich dataset collected via a novel, data-efficient recipe that prioritizes elementary motor primitives. To effectively exploit naturally high-frequency touch signals without sacrificing the existing capabilities of existing VLAs, we introduce a variable-rate Mixture-of-Transformers (MoT) architecture equipped with a novel temporal tactile VQ-VAE encoder. We demonstrate the effectiveness of tactile-reactive policies on 12 manipulation tasks requiring delicate force control and deformable object manipulation, achieving over 30% higher average success rate than the strongest baseline.

Multimodal Models Robotics & Embodied AI

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

T-Rex: Tactile-Reactive Dexterous Manipulation

Related Papers