Search papers, labs, and topics across Lattice.
The paper introduces a novel approach for generating photorealistic images from freehand sketches, addressing the challenge of inherent abstraction and distortions in such sketches. They propose a modulation-based generative model that prioritizes semantic interpretation over strict edge alignment, enabling training without pixel-aligned ground truth images. The method outperforms existing approaches in both semantic alignment and image quality, demonstrating the feasibility of generating high-quality images directly from freehand sketches.
Freehand sketches can now drive photorealistic image generation, even without paired training data, thanks to a novel loss that prioritizes semantic understanding over pixel-perfect alignment.
Recent years have witnessed remarkable progress in generative AI, with natural language emerging as the most common conditioning input. As underlying models grow more powerful, researchers are exploring increasingly diverse conditioning signals, such as depth maps, edge maps, camera parameters, and reference images, to give users finer control over generation. Among different modalities, sketches are a natural and long-standing form of human communication, enabling rapid expression of visual concepts. Previous literature has largely focused on edge maps, often misnamed 'sketches', yet algorithms that effectively handle true freehand sketches, with their inherent abstraction and distortions, remain underexplored. We pursue the challenging goal of balancing photorealism with sketch adherence when generating images from freehand input. A key obstacle is the absence of ground-truth, pixel-aligned images: by their nature, freehand sketches do not have a single correct alignment. To address this, we propose a modulation-based approach that prioritizes semantic interpretation of the sketch over strict adherence to individual edge positions. We further introduce a novel loss that enables training on freehand sketches without requiring ground-truth pixel-aligned images. We show that our method outperforms existing approaches in both semantic alignment with freehand sketch inputs and in the realism and overall quality of the generated images.