Search papers, labs, and topics across Lattice.
This paper introduces Variance-Invariance-Sketching Regularization (VISReg), a novel approach that enhances self-supervised learning by replacing traditional covariance regularization with a Sliced-Wasserstein-based sketching objective. VISReg effectively enforces the full distributional shape of embeddings while maintaining control over their scale, addressing the limitations of existing methods like VICReg and SIGReg. The method demonstrates superior performance on low-quality datasets and achieves state-of-the-art results on out-of-distribution benchmarks, even matching the performance of DINOv2 with significantly less training data.
VISReg not only stabilizes embedding training but also achieves state-of-the-art performance with a fraction of the data used by competing methods.
Self-supervised learning methods prevent embedding collapse via modeling heuristics or explicit regularization of the embedding space. Among the latter, VICReg decomposes regularization into variance and covariance objectives, offering flexibility and interpretability. However, covariance captures only second-order statistics -- encouraging decorrelation but failing to enforce the full distributional shape needed for stable training. Sketching-based methods such as SIGReg address this by aligning embeddings to an isotropic Gaussian, but lack flexibility and suffer from vanishing gradients under collapse. We propose Variance-Invariance-Sketching Regularization (VISReg), which replaces covariance with a Sliced-Wasserstein-based sketching objective that enforces full distributional shape, while retaining a variance term for scale control. By decoupling scale and shape, VISReg combines VICReg's flexibility with the distributional rigor of sketching methods, providing robust gradients even under collapse. We show that VISReg scales linearly, outperforms existing regularization on low-quality datasets, and is resilient to long-tailed and low-rank regimes. Pre-trained on ImageNet-1K, VISReg achieves state-of-the-art performance on out-of-distribution datasets. Pre-trained on ImageNet-22K, it matches DINOv2's OOD performance despite the latter using 10x more data (LVD-142M). Project and code: https://haiyuwu.github.io/visreg.