Feb 16, 2026arXiv:2602.14648

SketchingReality: From Freehand Scene Sketches To Photorealistic Images

Ahmed Bourouis, Mikhail Bessmeltsev, Yulia Gryaditskaya

AI Summary

The paper introduces a novel approach for generating photorealistic images from freehand sketches, addressing the challenge of inherent abstraction and distortions in such sketches. They propose a modulation-based generative model that prioritizes semantic interpretation over strict edge alignment, enabling training without pixel-aligned ground truth images. The method outperforms existing approaches in both semantic alignment and image quality, demonstrating the feasibility of generating high-quality images directly from freehand sketches.

Key Contribution

Freehand sketches can now drive photorealistic image generation, even without paired training data, thanks to a novel loss that prioritizes semantic understanding over pixel-perfect alignment.

Abstract

Recent years have witnessed remarkable progress in generative AI, with natural language emerging as the most common conditioning input. As underlying models grow more powerful, researchers are exploring increasingly diverse conditioning signals, such as depth maps, edge maps, camera parameters, and reference images, to give users finer control over generation. Among different modalities, sketches are a natural and long-standing form of human communication, enabling rapid expression of visual concepts. Previous literature has largely focused on edge maps, often misnamed 'sketches', yet algorithms that effectively handle true freehand sketches, with their inherent abstraction and distortions, remain underexplored. We pursue the challenging goal of balancing photorealism with sketch adherence when generating images from freehand input. A key obstacle is the absence of ground-truth, pixel-aligned images: by their nature, freehand sketches do not have a single correct alignment. To address this, we propose a modulation-based approach that prioritizes semantic interpretation of the sketch over strict adherence to individual edge positions. We further introduce a novel loss that enables training on freehand sketches without requiring ground-truth pixel-aligned images. We show that our method outperforms existing approaches in both semantic alignment with freehand sketch inputs and in the realism and overall quality of the generated images.

Computer Vision Multimodal Models

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

SketchingReality: From Freehand Scene Sketches To Photorealistic Images

Related Papers