Feb 15, 2026arXiv:2602.14147

LaViDa-R1: Advancing Reasoning for Unified Multimodal Diffusion Language Models

Shufan Li, Yuchen Zhu, Jiuxiang Gu, Kangning Liu, Zhe Lin, Aditya Grover, Jason Kuen

AI Summary

LaViDa-R1 is introduced as a multimodal diffusion language model (dLLM) designed for general-purpose reasoning, offering an alternative to auto-regressive LLMs. The model is trained using a unified post-training framework combining supervised finetuning (SFT) and multi-task reinforcement learning (RL), incorporating techniques like answer-forcing, tree search, and complementary likelihood estimation. Experiments demonstrate LaViDa-R1's effectiveness across diverse multimodal tasks such as visual math reasoning, reason-intensive grounding, and image editing.

Key Contribution

A unified post-training framework lets a multimodal diffusion language model master reasoning across visual math, grounding, and image editing tasks without task-specific reinforcement learning.

Abstract

Diffusion language models (dLLMs) recently emerged as a promising alternative to auto-regressive LLMs. The latest works further extended it to multimodal understanding and generation tasks. In this work, we propose LaViDa-R1, a multimodal, general-purpose reasoning dLLM. Unlike existing works that build reasoning dLLMs through task-specific reinforcement learning, LaViDa-R1 incorporates diverse multimodal understanding and generation tasks in a unified manner. In particular, LaViDa-R1 is built with a novel unified post-training framework that seamlessly integrates supervised finetuning (SFT) and multi-task reinforcement learning (RL). It employs several novel training techniques, including answer-forcing, tree search, and complementary likelihood estimation, to enhance effectiveness and scalability. Extensive experiments demonstrate LaViDa-R1's strong performance on a wide range of multimodal tasks, including visual math reasoning, reason-intensive grounding, and image editing.

Architecture Design (Transformers, SSMs, MoE)Multimodal Models Reasoning & Chain-of-Thought

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

LaViDa-R1: Advancing Reasoning for Unified Multimodal Diffusion Language Models

Related Papers