Search papers, labs, and topics across Lattice.
This paper introduces Dream-Tac, a unified Tactile-World Action Model that enhances robot manipulation in contact-rich environments by integrating tactile signals with visual observations. The model employs contact-gated visuotactile fusion and a contact-aware attention bias to improve cross-modal interactions, resulting in significantly improved action accuracy. With a dual-level acceleration strategy, Dream-Tac achieves up to 2.9脳 faster training and 1.8脳 faster inference while enhancing performance across six manipulation tasks by an average of 31.7%.
Dream-Tac boosts robot manipulation accuracy by over 31% by effectively merging tactile and visual data in real-time.
World action models inherit the predictive capability of world models, enabling action generation to be guided by anticipated future observations. However, they rely primarily on vision and often fail in contact-rich manipulation, where critical cues arise from physical interaction. In this paper, we propose Dream-Tac, a unified Tactile-World Action Model that jointly models actions, future visual observations, and tactile dynamics. Specifically, Dream-Tac introduces (i) contact-gated visuotactile fusion to selectively integrate tactile signals and (ii) a contact-aware attention bias to better regulate cross-modal interactions during manipulation. To support real-time deployment, we further design a dual-level acceleration strategy, reformulating the contact-aware bias to preserve the fused attention path during training and introducing cache-based diffusion acceleration at inference, achieving up to 2.9$\times$ faster training and 1.8$\times$ faster inference. Across six contact-rich manipulation tasks, Dream-Tac improves action accuracy by 31.7\% on average, demonstrating the effectiveness of unified visuotactile world modeling.Code is available at https://github.com/LYFCLOUDFAN/Dream-Tac.