Search papers, labs, and topics across Lattice.
The paper introduces the Sphere Encoder, a generative model that maps images to a uniform spherical latent space and decodes random points on the sphere to generate new images. This approach achieves competitive image generation performance compared to diffusion models but requires significantly fewer inference steps. The model is trained using only image reconstruction losses and supports conditional generation.
Skip the slow diffusion grind: this generative model matches diffusion model quality in a single pass by encoding images onto a sphere.
We introduce the Sphere Encoder, an efficient generative framework capable of producing images in a single forward pass and competing with many-step diffusion models using fewer than five steps. Our approach works by learning an encoder that maps natural images uniformly onto a spherical latent space, and a decoder that maps random latent vectors back to the image space. Trained solely through image reconstruction losses, the model generates an image by simply decoding a random point on the sphere. Our architecture naturally supports conditional generation, and looping the encoder/decoder a few times can further enhance image quality. Across several datasets, the sphere encoder approach yields performance competitive with state of the art diffusions, but with a small fraction of the inference cost. Project page is available at https://sphere-encoder.github.io .