Search papers, labs, and topics across Lattice.
The paper introduces Learnable Geometric Quantization (LGQ), a discrete image tokenizer that learns discretization geometry end-to-end to address the trade-off between flexibility and stability in existing quantizers. LGQ uses temperature-controlled soft assignments based on Gaussian mixture posterior responsibilities and minimizes a variational free-energy objective, converging to nearest-neighbor quantization at low temperatures. Experiments on ImageNet with a VQGAN backbone demonstrate that LGQ achieves stable optimization and balanced code utilization, improving rFID and reducing active code usage compared to FSQ and SimVQ, especially at large vocabulary sizes.
Achieve state-of-the-art image tokenization by learning the quantization geometry itself, outperforming existing methods like FSQ and SimVQ with fewer active codes and lower representation rates.
Discrete image tokenization is a key bottleneck for scalable visual generation: a tokenizer must remain compact for efficient latent-space priors while preserving semantic structure and using discrete capacity effectively. Existing quantizers face a trade-off: vector-quantized tokenizers learn flexible geometries but often suffer from biased straight-through optimization, codebook under-utilization, and representation collapse at large vocabularies. Structured scalar or implicit tokenizers ensure stable, near-complete utilization by design, yet rely on fixed discretization geometries that may allocate capacity inefficiently under heterogeneous latent statistics. We introduce Learnable Geometric Quantization (LGQ), a discrete image tokenizer that learns discretization geometry end-to-end. LGQ replaces hard nearest-neighbor lookup with temperature-controlled soft assignments, enabling fully differentiable training while recovering hard assignments at inference. The assignments correspond to posterior responsibilities of an isotropic Gaussian mixture and minimize a variational free-energy objective, provably converging to nearest-neighbor quantization in the low-temperature limit. LGQ combines a token-level peakedness regularizer with a global usage regularizer to encourage confident yet balanced code utilization without imposing rigid grids. Under a controlled VQGAN-style backbone on ImageNet across multiple vocabulary sizes, LGQ achieves stable optimization and balanced utilization. At 16K codebook size, LGQ improves rFID by 11.88% over FSQ while using 49.96% fewer active codes, and improves rFID by 6.06% over SimVQ with 49.45% lower effective representation rate, achieving comparable fidelity with substantially fewer active entries. Our GitHub repository is available at: https://github.com/KurbanIntelligenceLab/LGQ