Search papers, labs, and topics across Lattice.
The paper introduces ELViS, an image-to-image similarity model designed for strong generalization across diverse, unseen domains by operating in similarity space rather than representation space. ELViS leverages local descriptor correspondences, refines similarities using optimal transport with data-dependent gains, and aggregates strong correspondences via a voting process. Evaluated on a new benchmark of eight datasets, ELViS significantly outperforms existing methods in out-of-domain scenarios while being computationally efficient.
Achieve state-of-the-art image similarity generalization with a surprisingly simple, efficient, and interpretable model that operates on local descriptor correspondences.
Large-scale instance-level training data is scarce, so models are typically trained on domain-specific datasets. Yet in real-world retrieval, they must handle diverse domains, making generalization to unseen data critical. We introduce ELViS, an image-to-image similarity model that generalizes effectively to unseen domains. Unlike conventional approaches, our model operates in similarity space rather than representation space, promoting cross-domain transfer. It leverages local descriptor correspondences, refines their similarities through an optimal transport step with data-dependent gains that suppress uninformative descriptors, and aggregates strong correspondences via a voting process into an image-level similarity. This design injects strong inductive biases, yielding a simple, efficient, and interpretable model. To assess generalization, we compile a benchmark of eight datasets spanning landmarks, artworks, products, and multi-domain collections, and evaluate ELViS as a re-ranking method. Our experiments show that ELViS outperforms competing methods by a large margin in out-of-domain scenarios and on average, while requiring only a fraction of their computational cost. Code available at: https://github.com/pavelsuma/ELViS/