NYUMay 27, 2026arXiv:2605.28678

DREAM-R: Multimodal Speculative Reasoning with RL-Based Refined Drafting, Precise Verification, and Fully Parallel Execution

Yunhai Hu, Zining Liu, Xiangyang Yin, Tianhua Xia, Bo Bao, Eric Sather, Vithursan Thangarasa, Sai Qian Zhang

AI Summary

DREAM-R enhances speculative reasoning in multimodal models by introducing Speculative Alignment Policy Optimization (SAPO), a reinforcement learning objective that aligns draft models with target reasoning trajectories. It also incorporates a Threshold-based Verification Mechanism (TBVM) to ensure stable acceptance of speculative steps, minimizing error propagation. The Fully Parallel Speculative Reasoning (FPSR) framework parallelizes draft generation, target-side reasoning, and verification, achieving significant speedups without sacrificing accuracy on reasoning-heavy benchmarks.

Key Contribution

Speculative reasoning in multimodal models gets a serious upgrade: DREAM-R achieves significant speedups without compromising accuracy by aligning speculative drafts with target reasoning trajectories via RL.

Abstract

Speculative reasoning has recently been proposed as a means to accelerate reasoning-intensive generation in large multimodal models, but its effectiveness is often constrained by misalignment between speculative drafts and target-verified reasoning. In this work, we introduce DREAM-R, a framework that substantially improves the performance of speculative reasoning. At its core, DREAM-R employs Speculative Alignment Policy Optimization (SAPO), a reinforcement-learning objective that trains draft models to generate reasoning steps that are both faithful to target trajectories and concise. We further propose a Threshold-based Verification Mechanism (TBVM) that uses a ratio-based criterion to provide stable and interpretable acceptance of speculative steps only when positive evidence clearly dominates, thereby preventing error propagation. Building on these components, we develop a Fully Parallel Speculative Reasoning (FPSR) framework that parallelizes draft generation, target-side reasoning, and verification across multi-step reasoning, enabling early stopping and clean fallback. Experiments on reasoning-heavy benchmarks demonstrate up to speedup while preserving target-model accuracy, yielding substantial efficiency gains without compromising reasoning quality.

Multimodal Models Reasoning & Chain-of-Thought RLHF & Preference Learning

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

DREAM-R: Multimodal Speculative Reasoning with RL-Based Refined Drafting, Precise Verification, and Fully Parallel Execution

Related Papers