Feb 24, 2026arXiv:2602.21133

SOM-VQ: Topology-Aware Tokenization for Interactive Generative Models

Alessandro Londei, Denise Lanzieri, Matteo Benati

AI Summary

The paper introduces SOM-VQ, a novel tokenization method that integrates vector quantization with Self-Organizing Maps (SOMs) to create discrete codebooks with explicit low-dimensional topology. This topology-aware approach ensures that neighboring tokens on the learned grid correspond to semantically similar states, enabling geometric manipulation of the latent space for interpretable control. Experiments in human motion generation demonstrate that SOM-VQ yields more learnable token sequences and allows intuitive human-in-the-loop control via grid-based sampling, facilitating controlled divergence and convergence from reference sequences.

Key Contribution

Steer generative models with unprecedented ease: SOM-VQ lets you control outputs by simply nudging tokens around on a learned grid.

Abstract

Vector-quantized representations enable powerful discrete generative models but lack semantic structure in token space, limiting interpretable human control. We introduce SOM-VQ, a tokenization method that combines vector quantization with Self-Organizing Maps to learn discrete codebooks with explicit low-dimensional topology. Unlike standard VQ-VAE, SOM-VQ uses topology-aware updates that preserve neighborhood structure: nearby tokens on a learned grid correspond to semantically similar states, enabling direct geometric manipulation of the latent space. We demonstrate that SOM-VQ produces more learnable token sequences in the evaluated domains while providing an explicit navigable geometry in code space. Critically, the topological organization enables intuitive human-in-the-loop control: users can steer generation by manipulating distances in token space, achieving semantic alignment without frame-level constraints. We focus on human motion generation - a domain where kinematic structure, smooth temporal continuity, and interactive use cases (choreography, rehabilitation, HCI) make topology-aware control especially natural - demonstrating controlled divergence and convergence from reference sequences through simple grid-based sampling. SOM-VQ provides a general framework for interpretable discrete representations applicable to music, gesture, and other interactive generative domains.

Architecture Design (Transformers, SSMs, MoE)Interpretability & Mechanistic Interp Natural Language Processing

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

SOM-VQ: Topology-Aware Tokenization for Interactive Generative Models

Related Papers