Mar 16, 2026arXiv:2603.15150

SNCE: Geometry-Aware Supervision for Scalable Discrete Image Generation

Shufan Li, Jiuxiang Gu, Kangning Liu, Zhe Lin, Aditya Grover, Jason Kuen

AI Summary

This paper introduces Stochastic Neighbor Cross Entropy Minimization (SNCE), a novel training objective for discrete image generation models with large VQ codebooks. SNCE supervises the model with a soft categorical distribution over neighboring tokens, weighted by the proximity between their code embeddings and the ground-truth image embedding. Experiments on ImageNet-256 generation, text-to-image synthesis, and image editing demonstrate that SNCE accelerates convergence and enhances generation quality relative to standard cross-entropy.

Key Contribution

Ditch the one-hot targets: a new loss function slashes training time and boosts image quality for discrete image generators with massive codebooks.

Abstract

Recent advancements in discrete image generation showed that scaling the VQ codebook size significantly improves reconstruction fidelity. However, training generative models with a large VQ codebook remains challenging, typically requiring larger model size and a longer training schedule. In this work, we propose Stochastic Neighbor Cross Entropy Minimization (SNCE), a novel training objective designed to address the optimization challenges of large-codebook discrete image generators. Instead of supervising the model with a hard one-hot target, SNCE constructs a soft categorical distribution over a set of neighboring tokens. The probability assigned to each token is proportional to the proximity between its code embedding and the ground-truth image embedding, encouraging the model to capture semantically meaningful geometric structure in the quantized embedding space. We conduct extensive experiments across class-conditional ImageNet-256 generation, large-scale text-to-image synthesis, and image editing tasks. Results show that SNCE significantly improves convergence speed and overall generation quality compared to standard cross-entropy objectives.

Architecture Design (Transformers, SSMs, MoE)Computer Vision Training Efficiency & Optimization

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

SNCE: Geometry-Aware Supervision for Scalable Discrete Image Generation

Related Papers