HunanNJUUniversity of Electronic Science and TechnologyMay 25, 2026arXiv:2605.25775

DRFusion: Drift-Resilient Temporally Consistent Infrared-Visible Video Fusion

Xingyuan Li, Haoyuan Xu, Shulin Li, Xiang Chen, Zhiying Jiang, Jinyuan Liu

AI Summary

The paper introduces DRFusion, a novel diffusion-based approach for infrared and visible video fusion that addresses the challenge of maintaining temporal consistency in dynamic scenes. It reframes video fusion as history-conditioned motion generation, using Stabilized History Guidance and Soft Temporal Anchoring to implicitly aggregate motion dynamics. Experiments demonstrate state-of-the-art performance in fusion quality and temporal stability compared to methods relying on optical flow or frame-by-frame diffusion.

Key Contribution

Diffusion models can finally produce temporally stable video fusion by reframing the task as history-conditioned motion generation, sidestepping the limitations of optical flow and frame-by-frame processing.

Abstract

Infrared and visible video fusion is essential for achieving comprehensive perception in dynamic scenes. However, maintaining temporal consistency remains a formidable challenge. Conventional methods relying on optical flow often suffer from geometric rigidity and ghosting artifacts. Moreover, standard diffusion-based fusion models typically operate in a frame-by-frame manner; when extended to autoregressive settings, they lack intrinsic temporal constraints and are prone to severe error accumulation and drifting, where minor artifacts amplify over time. To address these limitations, we propose a drift-resilient video fusion method that reformulates the task as history-conditioned motion generation. We introduce Stabilized History Guidance and Soft Temporal Anchoring to reframe temporal consistency as spectral filtering, implicitly aggregating motion dynamics without rigid alignment. Furthermore, our Decoupled Structure-Motion Adaptation strategy bridges pre-trained priors and structural constraints via two-stage training and latent refinement. Extensive experiments demonstrate that our method achieves state-of-the-art performance in both fusion quality and temporal stability.

Computer Vision Multimodal Models

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

DRFusion: Drift-Resilient Temporally Consistent Infrared-Visible Video Fusion

Related Papers