Search papers, labs, and topics across Lattice.
The paper addresses the problem of efficiently merging proximity graph-based approximate nearest neighbor (AKNN) indexes, which is crucial for scaling AKNN search to large datasets when memory constraints prevent building a single index. They introduce a Reverse Neighbor Sliding Merge (RNSM) algorithm that leverages structural information to accelerate the merging of two indexes. Furthermore, they propose a Merge Order Selection (MOS) strategy to minimize redundant merge operations when combining multiple sub-indexes.
Achieve up to 5.48x speedup in merging proximity graph indexes for AKNN search by intelligently exploiting structural information, outperforming naive reconstruction by nearly 10x.
Approximate $k$ nearest neighbor (AKNN) search in high-dimensional space is a foundational problem in vector databases with widespread applications. Among the numerous AKNN indexes, Proximity Graph-based indexes achieve state-of-the-art search efficiency across various benchmarks. However, their extensive distance computations of high-dimensional vectors lead to slow construction and substantial memory overhead. The limited memory capacity often prevents building the entire index at once when handling large-scale datasets. A common practice is to build multiple sub-indexes separately. However, directly searching on these separated indexes severely compromises search efficiency, as queries cannot leverage cross-graph connections. Therefore, efficient graph index merging is crucial for multi-index searching. In this paper, we focus on efficient two-index merging and the merge order of multiple indexes for AKNN search. To achieve this, we propose a reverse neighbor sliding merge (RNSM) that exploits structural information to boost merging efficiency. We further investigate merge order selection (MOS) to reduce the merging cost by eliminating redundant merge operations. Experiments show that our approach yields up to a 5.48$\times$ speedup over existing index merge methods and 9.92$\times$ speedup over index reconstruction, while maintaining expected superior search performance. Moreover, our method scales efficiently to 100 million vectors with 50 partitions, maintaining consistent speedups.