Hosei UniversityOct 22, 2025

Recipe Diffusion: Cross-Frame Attention and Region-Aware Diffusion for Coherent Visual Recipe Instruction Generation

AI Summary

The paper introduces Recipe Diffusion, a framework for generating coherent visual recipe instructions by enforcing cross-frame consistency and region-aware noise application. They modify attention layers to share key-value pairs across frames, promoting global consistency, and apply noise differentially based on object regions to preserve object identity while allowing contextual variations. Experiments demonstrate the framework's ability to generate recipe instruction sequences with improved coherence and object fidelity compared to independent frame generation approaches.

Key Contribution

Achieve coherent, step-by-step visual recipe generation without training by intelligently sharing information across frames and applying noise selectively.

Abstract

This paper presents a cross-frame attention and region-aware diffusion method for generating coherent, step-by-step visual instructions for cooking recipes. Our approach combines two complementary mechanisms: (1) cross-frame keyvalue sharing in attention layers to maintain global consistency across sequential frames, and (2) region-aware noise application, which preserves object identity while allowing contextual changes. Unlike conventional models that generate each image independently, our training-free framework leverages pre-trained detection and segmentation models to create region masks and modifies the attention mechanism to share visual features across frames. By integrating differential noise application with crossframe attention consistency, our system generates recipe instruction sequences that maintain both global coherence and local object identity throughout each step.

Architecture Design (Transformers, SSMs, MoE)Computer Vision Multimodal Models

Citation Metrics

Citations0

Influential citations0

References13

Year2025

VenueAsia-Pacific Signal and Information Processing Association Annual Summit and Conference

Related Papers

Finding related papers...

Search

Recipe Diffusion: Cross-Frame Attention and Region-Aware Diffusion for Coherent Visual Recipe Instruction Generation

Related Papers