Search papers, labs, and topics across Lattice.
This paper introduces STABLE, a hybrid approximate nearest neighbor search framework designed to handle heterogeneity in data distribution by addressing the compatibility barrier for similarity magnitude and the tolerance bottleneck to attribute cardinality. STABLE uses an enhanced heterogeneous semantic perception (AUTO) metric for joint feature similarity and attribute consistency measurement, and constructs a Heterogeneous sEmantic reLation Graph (HELP) index based on AUTO. The framework then employs a dynamic heterogeneity routing method to ensure efficient search, achieving state-of-the-art performance on various benchmarks.
Achieve state-of-the-art hybrid nearest neighbor search by explicitly modeling and mitigating data heterogeneity, a problem often overlooked in existing approaches.
Hybrid Approximate Nearest Neighbor Search (Hybrid ANNS) is a foundational search technology for large-scale heterogeneous data and has gained significant attention in both academia and industry. However, current approaches overlook the heterogeneity in data distribution, thus ignoring two major challenges: the Compatibility Barrier for Similarity Magnitude Heterogeneity and the Tolerance Bottleneck to Attribute Cardinality. To overcome these issues, we propose the robuSt heTerogeneity-Aware hyBrid retrievaL framEwork, STABLE, designed for accurate, efficient, and robust hybrid ANNS under datasets with various distributions. Specifically, we introduce an enhAnced heterogeneoUs semanTic perceptiOn (AUTO) metric to achieve a joint measurement of feature similarity and attribute consistency, addressing similarity magnitude heterogeneity and improving robustness to datasets with various attribute cardinalities. Thereafter, we construct our Heterogeneous sEmantic reLation graPh (HELP) index based on AUTO to organize heterogeneous semantic relations. Finally, we employ a novel Dynamic Heterogeneity Routing method to ensure an efficient search. Extensive experiments on five feature vector benchmarks with various attribute cardinalities demonstrate the superior performance of STABLE.