Feb 12, 2026arXiv:2602.12221

Best of Both Worlds: Multimodal Reasoning and Generation via Unified Discrete Flow Matching

Onkar Susladkar, Tushar Prakash, Gayatri S Deshmukh, Kiet A. Nguyen, Jiaxun Zhang, A. Juvekar, Tianshu Bao, Lin Chai, Sparsh Mittal, I. Dhillon, Ismini Lourentzou

AI Summary

The paper introduces UniDFlow, a discrete flow-matching framework that unifies multimodal understanding, generation, and editing tasks. UniDFlow uses task-specific low-rank adapters to decouple understanding and generation, preventing objective interference. It also employs a reference-based multimodal preference alignment method to optimize relative outcomes under identical conditioning, leading to improved faithfulness and controllability.

Key Contribution

Achieve SOTA multimodal performance across eight benchmarks and strong zero-shot generalization without task-specific training by decoupling understanding and generation via unified discrete flow matching.

Abstract

We propose UniDFlow, a unified discrete flow-matching framework for multimodal understanding, generation, and editing. It decouples understanding and generation via task-specific low-rank adapters, avoiding objective interference and representation entanglement, while a novel reference-based multimodal preference alignment optimizes relative outcomes under identical conditioning, improving faithfulness and controllability without large-scale retraining. UniDFlpw achieves SOTA performance across eight benchmarks and exhibits strong zero-shot generalization to tasks including inpainting, in-context image generation, reference-based editing, and compositional generation, despite no explicit task-specific training.

Architecture Design (Transformers, SSMs, MoE)Multimodal Models Reasoning & Chain-of-Thought

Citation Metrics

Citations0

Influential citations0

References71

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Best of Both Worlds: Multimodal Reasoning and Generation via Unified Discrete Flow Matching

Related Papers