Apr 27, 2026arXiv:2604.24469

Geometric Analysis of Self-Supervised Vision Representations for Semantic Image Retrieval

Esteban Rodr'iguez-Betancourt, Edgar Casasola-Murillo

AI Summary

This paper investigates the suitability of self-supervised vision representations for content-based image retrieval (CBIR) using vector databases and approximate nearest neighbor (ANN) search. It finds that the latent space geometry of these representations, particularly anisotropy and skewness, significantly impacts ANN indexing performance. The study demonstrates that representations with higher isotropy and local purity lead to better semantic retrieval performance by aligning better with the distance-based assumptions of ANN indexes.

Key Contribution

Self-supervised vision models that ace linear probing can still flop at semantic image retrieval because of skewed latent space geometry that breaks approximate nearest neighbor search.

Abstract

Content-based image retrieval (CBIR) systems enable users to search images based on visual content instead of relying on metadata. The text domain has benefited from vector search of representations created with unsupervised methods such as BERT. However, modern self-supervised learning methods for vision are mostly not reported in CBIR-related literature, instead relying on supervised models or multi-modal methods that align text and vision. We evaluate how the representations learned by modern self-supervised learning methods for vision perform under typical retrieval stacks that leverage vector databases and nearest neighbor search. Our evaluation reveals that the latent space geometry impacts approximate nearest neighbor (ANN) indexing. Specifically, highly anisotropic representations with high skewness produced by several modern SSL methods degrade the performance of partition-based and hashing-based search, even if their own linear probe or K-NN accuracy is not affected. In contrast, representations with higher isotropy and local purity better satisfy the distance-based assumptions of ANN indexes, leading to improved semantic retrieval performance.

Computer Vision Multimodal Models Recommendation & Information Retrieval

Citation Metrics

Citations0

Influential citations0

References31

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Geometric Analysis of Self-Supervised Vision Representations for Semantic Image Retrieval

Related Papers