Search papers, labs, and topics across Lattice.
The paper introduces DRFusion, a novel diffusion-based approach for infrared and visible video fusion that addresses the challenge of maintaining temporal consistency in dynamic scenes. It reframes video fusion as history-conditioned motion generation, using Stabilized History Guidance and Soft Temporal Anchoring to implicitly aggregate motion dynamics. Experiments demonstrate state-of-the-art performance in fusion quality and temporal stability compared to methods relying on optical flow or frame-by-frame diffusion.
Diffusion models can finally produce temporally stable video fusion by reframing the task as history-conditioned motion generation, sidestepping the limitations of optical flow and frame-by-frame processing.
Infrared and visible video fusion is essential for achieving comprehensive perception in dynamic scenes. However, maintaining temporal consistency remains a formidable challenge. Conventional methods relying on optical flow often suffer from geometric rigidity and ghosting artifacts. Moreover, standard diffusion-based fusion models typically operate in a frame-by-frame manner; when extended to autoregressive settings, they lack intrinsic temporal constraints and are prone to severe error accumulation and drifting, where minor artifacts amplify over time. To address these limitations, we propose a drift-resilient video fusion method that reformulates the task as history-conditioned motion generation. We introduce Stabilized History Guidance and Soft Temporal Anchoring to reframe temporal consistency as spectral filtering, implicitly aggregating motion dynamics without rigid alignment. Furthermore, our Decoupled Structure-Motion Adaptation strategy bridges pre-trained priors and structural constraints via two-stage training and latent refinement. Extensive experiments demonstrate that our method achieves state-of-the-art performance in both fusion quality and temporal stability.