Search papers, labs, and topics across Lattice.
This paper surveys the landscape of image generation models, providing a technical overview of VAEs, GANs, normalizing flows, autoregressive models, transformers, and diffusion models. It details the objectives, architectures, training, optimization, and limitations of each model type, while also covering recent advances in video generation. The survey concludes with a discussion of robustness, responsible deployment, and deepfake risks associated with these models.
Untangle the complex web of image generation models with this comprehensive technical history, spanning VAEs to diffusion models and highlighting failure modes along the way.
Image generation has advanced rapidly over the past decade, yet the literature seems fragmented across different models and application domains. This paper aims to offer a comprehensive survey of breakthrough image generation models, including variational autoencoders (VAEs), generative adversarial networks (GANs), normalizing flows, autoregressive and transformer-based generators, and diffusion-based methods. We provide a detailed technical walkthrough of each model type, including their underlying objectives, architectural building blocks, and algorithmic training steps. For each model type, we present the optimization techniques as well as common failure modes and limitations. We also go over recent developments in video generation and present the research works that made it possible to go from still frames to high quality videos. Lastly, we cover the growing importance of robustness and responsible deployment of these models, including deepfake risks, detection, artifacts, and watermarking.