Lattice AI Research

Research focus

Architecture Design (Transformers, SSMs, MoE) (1)Distributed Systems & Hardware (1)Inference & Quantization (1)Eval Frameworks & Benchmarks (1)

Frequent co-authors

Haidong Rong (1)Jiashu Yao (1)Matthias Langer (1)Shijie Liu (1)

Papers (2)

Mar 17, 2026

NVIDIAMar 17, 2026·also BIT, ByteDance, Tencent AI, Vipshop

HierarchicalKV: A GPU Hash Table with Cache Semantics for Continuous Online Embedding Storage

Stop wasting precious GPU memory: this new cache-semantic hash table library achieves up to 3.9 billion key-value lookups per second, outperforming standard approaches by up to 9.4x.

Haidong Rong, Jiashu Yao, Matthias Langer +11

Architecture Design (Transformers, SSMs, MoE)Distributed Systems & Hardware Inference & Quantization

Nov 10, 2025

NVIDIANov 10, 2025

Llama-Embed-Nemotron-8B: A Universal Text Embedding Model for Multilingual and Cross-Lingual Tasks

Forget closed-source embedding models: llama-embed-nemotron-8b just topped the MMTEB leaderboard with fully open weights and a data recipe you can actually reproduce.

Yauhen Babakhin, Radek Osmulski, Ronay Ak +511

Eval Frameworks & Benchmarks Natural Language Processing Open-Source Models & Weights

Search

Even Oldridge

Research focus

Frequent co-authors

Papers (2)