NVIDIABITByteDanceTencent AIVipshopMar 17, 2026arXiv:2603.17168

HierarchicalKV: A GPU Hash Table with Cache Semantics for Continuous Online Embedding Storage

Haidong Rong, Jiashu Yao, Matthias Langer, Shijie Liu, Li Fan, Dongxin Wang, Jia He, Jiaheng Rang, Julian Qian, Mengyao Xu, Fan Yu, Minseok Lee, Zehuan Wang, Even Oldridge

AI Summary

The paper introduces HierarchicalKV (HKV), a novel GPU hash table library that implements cache semantics for continuous online embedding storage, addressing the HBM limitations of traditional dictionary-semantic hash tables. HKV employs cache-line-aligned buckets, score-driven upsert, dynamic dual-bucket selection, and triple-group concurrency, along with tiered key-value separation for scaling beyond HBM capacity. Benchmarks on an NVIDIA H100 NVL GPU demonstrate HKV achieves up to 3.9 B-KV/s find throughput and outperforms existing dictionary-semantic baselines like WarpCore by up to 1.4x and indirection-based baselines by 2.6-9.4x.

Key Contribution

Stop wasting precious GPU memory: this new cache-semantic hash table library achieves up to 3.9 billion key-value lookups per second, outperforming standard approaches by up to 9.4x.

Abstract

Traditional GPU hash tables preserve every inserted key -- a dictionary assumption that wastes scarce High Bandwidth Memory (HBM) when embedding tables routinely exceed single-GPU capacity. We challenge this assumption with cache semantics, where policy-driven eviction is a first-class operation. We introduce HierarchicalKV (HKV), the first general-purpose GPU hash table library whose normal full-capacity operating contract is cache-semantic: each full-bucket upsert (update-or-insert) is resolved in place by eviction or admission rejection rather than by rehashing or capacity-induced failure. HKV co-designs four core mechanisms -- cache-line-aligned buckets, in-line score-driven upsert, score-based dynamic dual-bucket selection, and triple-group concurrency -- and uses tiered key-value separation as a scaling enabler beyond HBM. On an NVIDIA H100 NVL GPU, HKV achieves up to 3.9 billion key-value pairs per second (B-KV/s) find throughput, stable across load factors 0.50-1.00 (<5% variation), and delivers 1.4x higher find throughput than WarpCore (the strongest dictionary-semantic GPU baseline at lambda=0.50) and up to 2.6-9.4x over indirection-based GPU baselines. Since its open-source release in October 2022, HKV has been integrated into multiple open-source recommendation frameworks.

Architecture Design (Transformers, SSMs, MoE)Distributed Systems & Hardware Inference & Quantization

Citation Metrics

Citations0

Influential citations0

References61

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

HierarchicalKV: A GPU Hash Table with Cache Semantics for Continuous Online Embedding Storage

Related Papers