UESTCUNCFeb 26, 2026arXiv:2602.23022

DMAligner: Enhancing Image Alignment via Diffusion Model Based View Synthesis

Xinglong Luo, Xinglong Luo, Ao Luo, Ao Luo, Zhengning Wang, Zhengning Wang, Yue Yang, Yueqi Yang, Chaoyu Feng, Chaoyu Feng, Lei Lei, Lei Lei, Bing Zeng, Bing Zeng, Shuaicheng Liu, Shuaicheng Liu

AI Summary

The paper introduces DMAligner, a diffusion-based image alignment framework that synthesizes novel views to circumvent the limitations of optical flow-based methods, such as sensitivity to occlusions and illumination changes. They propose a Dynamics-aware Diffusion Training approach, incorporating a Dynamics-aware Mask Producing (DMP) module to differentiate dynamic foreground from static backgrounds, improving the diffusion model's ability to handle challenging scenarios. The authors also create a Dynamic Scene Image Alignment (DSIA) dataset with 1,033 scenes and 30K image pairs, demonstrating DMAligner's superior performance on DSIA and other video datasets.

Key Contribution

Ditch optical flow for image alignment: DMAligner leverages diffusion models to synthesize novel views, sidestepping traditional warping's occlusion and illumination woes.

Abstract

Image alignment is a fundamental task in computer vision with broad applications. Existing methods predominantly employ optical flow-based image warping. However, this technique is susceptible to common challenges such as occlusions and illumination variations, leading to degraded alignment visual quality and compromised accuracy in downstream tasks. In this paper, we present DMAligner, a diffusion-based framework for image alignment through alignment-oriented view synthesis. DMAligner is crafted to tackle the challenges in image alignment from a new perspective, employing a generation-based solution that showcases strong capabilities and avoids the problems associated with flow-based image warping. Specifically, we propose a Dynamics-aware Diffusion Training approach for learning conditional image generation, synthesizing a novel view for image alignment. This incorporates a Dynamics-aware Mask Producing (DMP) module to adaptively distinguish dynamic foreground regions from static backgrounds, enabling the diffusion model to more effectively handle challenges that classical methods struggle to solve. Furthermore, we develop the Dynamic Scene Image Alignment (DSIA) dataset using Blender, which includes 1,033 indoor and outdoor scenes with over 30K image pairs tailored for image alignment. Extensive experimental results demonstrate the superiority of the proposed approach on DSIA benchmarks, as well as on a series of widely-used video datasets for qualitative comparisons. Our code is available at https://github.com/boomluo02/DMAligner.

Architecture Design (Transformers, SSMs, MoE)Computer Vision Multimodal Models

Citation Metrics

Citations0

Influential citations0

References66

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

DMAligner: Enhancing Image Alignment via Diffusion Model Based View Synthesis

Related Papers