Search papers, labs, and topics across Lattice.
This paper addresses the limitations of current Vision-Language-Action (VLA) models in robotic manipulation by introducing a large-scale tactile-rich dataset and a novel Mixture-of-Transformers architecture that incorporates a temporal tactile VQ-VAE encoder. The authors collected 100 hours of diverse tactile data to enhance the dynamic responsiveness of robots to tactile signals, which has been largely neglected in existing models. Their approach significantly improves performance, achieving over a 30% increase in success rates across 12 manipulation tasks that require fine motor control and handling of deformable objects.
Tactile-reactive manipulation can boost robotic success rates by over 30% in delicate tasks, revolutionizing how robots interact with their environment.
The ability to react dynamically to tactile signals has long been considered crucial to agile human-level dexterity. Yet contemporary learning-based Vision-Language-Action (VLA) models for robotic manipulation generally either overlook the tactile modality or are limited to encoders with static cues, due in part to the scarcity of diverse training data and standardized evaluation, architectural constraints in current VLA models, and limitations of static tactile encoders. In this paper, we push the frontier of tactile-reactive manipulation by addressing all of these limitations. We propose a large-scale, 100-hour tactile-rich dataset collected via a novel, data-efficient recipe that prioritizes elementary motor primitives. To effectively exploit naturally high-frequency touch signals without sacrificing the existing capabilities of existing VLAs, we introduce a variable-rate Mixture-of-Transformers (MoT) architecture equipped with a novel temporal tactile VQ-VAE encoder. We demonstrate the effectiveness of tactile-reactive policies on 12 manipulation tasks requiring delicate force control and deformable object manipulation, achieving over 30% higher average success rate than the strongest baseline.