EPITAEPITA Research LabLagrange Mathematics and ComputingMBZUAIPolytechniqueSorbonneFeb 15, 2026arXiv:2602.14157

When Test-Time Guidance Is Enough: Fast Image and Video Editing with Diffusion Guidance

Ahmed Ghorbel, Badr Moufad, Navid Bagheri Shouraki, Alain Oliviero Durmus, Thomas Hirtz, Eric Moulines, Jimmy Olsson, Yazid Janati

AI Summary

This paper investigates text-driven image and video editing as an inpainting problem using test-time guidance with diffusion models. It addresses the computational bottleneck of vector-Jacobian product (VJP) computations in existing guidance methods by leveraging and theoretically justifying a VJP-free approximation. The work demonstrates that this efficient test-time guidance achieves competitive or superior performance compared to training-based approaches on large-scale image and video editing benchmarks.

Key Contribution

Skip the expensive gradients: a simple VJP-free approximation lets you edit images and videos with diffusion models just as well as training-heavy approaches.

Abstract

Text-driven image and video editing can be naturally cast as inpainting problems, where masked regions are reconstructed to remain consistent with both the observed content and the editing prompt. Recent advances in test-time guidance for diffusion and flow models provide a principled framework for this task; however, existing methods rely on costly vector--Jacobian product (VJP) computations to approximate the intractable guidance term, limiting their practical applicability. Building upon the recent work of Moufad et al. (2025), we provide theoretical insights into their VJP-free approximation and substantially extend their empirical evaluation to large-scale image and video editing benchmarks. Our results demonstrate that test-time guidance alone can achieve performance comparable to, and in some cases surpass, training-based methods.

Computer Vision Multimodal Models

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

When Test-Time Guidance Is Enough: Fast Image and Video Editing with Diffusion Guidance

Related Papers