Search papers, labs, and topics across Lattice.
This paper introduces HELMSMAN, a novel clustering-based approximate nearest neighbor search (ANNS) system designed to address the high costs and performance challenges faced by RedNote's existing HNSW implementation. By leveraging an ANNS-oriented userspace storage stack, a leveling-learned pruning module, and GPU-accelerated construction pipelines, HELMSMAN significantly reduces hardware costs by over 90% and achieves billion-scale index rebuilds within hours. The deployment of HELMSMAN has successfully stabilized operations with a fraction of the resources previously required, demonstrating its effectiveness in handling large-scale ANNS workloads.
HELMSMAN slashes hardware costs by over 90% while enabling billion-scale index rebuilds in mere hours, revolutionizing ANNS for large-scale applications.
RedNote (a.k.a., Xiaohongshu, a global-scale social network platform) widely adopts approximate nearest neighbor search (ANNS) to power its search, recommendation, and advertising services. Due to the demanding Service Level Agreements (SLAs), we have to rely on in-memory graph-based ANNS (i.e., HNSW) to provide high throughput and low latency. However, the ever-growing user base and content volume have led to an explosive increase in memory footprint and consequently huge CapEx and OpEx. After exploring various alternatives, we find that building a clustering-based ANNS on top of all-flash servers can be promising. Yet, we still experience severe overheads from the kernel I/O stack, a fixed pruning strategy, and slow index construction. We present HELMSMAN, a high-performance and cost-effective clustering-based ANNS system, which combines an ANNS-oriented userspace storage stack, a leveling-learned pruning module, and GPU-accelerated pipelines of construction. HELMSMAN saves over 90% of hardware costs and enables billion-scale index (re)builds within hours. In the current production deployment, operating stably for several months, 40 machines now host ANNS workloads that previously required about 35,000 cores and 0.35 PB DRAM.