FudanGreat Bay UniversityNJURoboticsXiamen UniversityMay 26, 2026arXiv:2605.27154

Touch-R1: Reinforcing Touch Reasoning in MLLMs

Yingxin Lai, Yafei Zhou, Fucai Zhu, Weihao Yuan

AI Summary

The paper introduces Touch-R1, a tactile reasoning MLLM based on Qwen2.5-VL-7B, trained with a novel tactile-grounded GRPO objective that incorporates ordinal-aware accuracy, cross-sensor physical consistency, structured-format control, and input-side tactile grounding. This approach uses a tactile-use reward that only assigns credit when authentic tactile inputs improve correctness compared to counterfactual controls. Evaluated on the new TouchReason-Bench, Touch-R1-7B significantly outperforms Octopi-13B and GPT-4o, demonstrating emergent reasoning behaviors grounded in physical contact.

Key Contribution

Tactile reinforcement learning unlocks a 24.7% performance boost over GPT-4o in tactile reasoning, revealing that MLLMs can learn to ground reasoning in physical contact.

Abstract

While rule-based reinforcement learning has recently catalyzed explicit reasoning in multimodal models, tactile reasoning remains largely underexplored. Existing tactile-language models primarily rely on supervised or contrastive objectives, which limits their capacity to ground predictions in physical evidence or rectify misleading visual priors. Tactile reasoning introduces two modality-specific challenges: the ordinal nature of physical attributes (e.g., hardness, roughness) and the cross-sensor distribution shifts inherent in optical tactile hardware. In this work, we introduce TouchReason-1M, a large-scale multimodal dataset comprising over 1M synchronized tactile pairs across four distinct sensors, and TouchReason-Bench, a rigorous framework for evaluating tactile perception and visual-tactile conflict resolution. Building upon these, we propose Touch-R1, a tactile reasoning MLLM based on Qwen2.5-VL-7B. Touch-R1 is trained via a tactile-grounded GRPO objective that combines ordinal-aware accuracy, cross-sensor physical consistency, structured-format control, and an input-side tactile grounding objective. Specifically, the tactile-use reward assigns credit only when authentic tactile inputs yield superior correctness relative to counterfactual controls where the tactile stream is removed, shuffled, or noise-masked. On TouchReason-Bench, Touch-R1-7B outperforms Octopi-13B by 18.4\% and GPT-4o by 24.7\% on average. Its structured reasoning traces reveal emergent behaviors of probing, comparison, and revision, demonstrating that R1-style reasoning can be effectively grounded in physical contact.

Multimodal Models Reasoning & Chain-of-Thought Robotics & Embodied AI

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Touch-R1: Reinforcing Touch Reasoning in MLLMs

Related Papers