Feb 16, 2026arXiv:2602.15030

Image Generation with a Sphere Encoder

Kaiyu Yue, Kaiyu Yue, Menglin Jia, Menglin Jia, Ji Hou, Tom Goldstein, Tom Goldstein

AI Summary

The paper introduces the Sphere Encoder, a generative model that maps images to a uniform spherical latent space and decodes random points on the sphere to generate new images. This approach achieves competitive image generation performance compared to diffusion models but requires significantly fewer inference steps. The model is trained using only image reconstruction losses and supports conditional generation.

Key Contribution

Skip the slow diffusion grind: this generative model matches diffusion model quality in a single pass by encoding images onto a sphere.

Abstract

We introduce the Sphere Encoder, an efficient generative framework capable of producing images in a single forward pass and competing with many-step diffusion models using fewer than five steps. Our approach works by learning an encoder that maps natural images uniformly onto a spherical latent space, and a decoder that maps random latent vectors back to the image space. Trained solely through image reconstruction losses, the model generates an image by simply decoding a random point on the sphere. Our architecture naturally supports conditional generation, and looping the encoder/decoder a few times can further enhance image quality. Across several datasets, the sphere encoder approach yields performance competitive with state of the art diffusions, but with a small fraction of the inference cost. Project page is available at https://sphere-encoder.github.io .

Architecture Design (Transformers, SSMs, MoE)Computer Vision Training Efficiency & Optimization

Citation Metrics

Citations0

Influential citations0

References96

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Image Generation with a Sphere Encoder

Related Papers