CNRSSorbonneMar 10, 2026arXiv:2603.09933

A Voronoi Cell Formulation for Principled Token Pruning in Late-Interaction Retrieval Models

Yash Kankanampati, Yuxuan Zong, Nadi Tomeh, Benjamin Piwowarksi, Joseph Le Roux

AI Summary

This paper introduces a novel token pruning framework for late-interaction retrieval models like ColBERT, grounded in hyperspace geometry and Voronoi cell estimation. The method interprets each token's influence based on its Voronoi region in the embedding space, enabling principled pruning that balances retrieval quality and index size reduction. Experiments demonstrate the approach's effectiveness as a pruning strategy and its utility for improving and interpreting token-level behavior in dense retrieval.

Key Contribution

Token pruning in dense retrieval gets a geometric upgrade: Voronoi cells offer a principled way to shrink your index without sacrificing search quality.

Abstract

Late-interaction models like ColBERT offer a competitive performance across various retrieval tasks, but require storing a dense embedding for each document token, leading to a substantial index storage overhead. Past works address this by attempting to prune low-importance token embeddings based on statistical and empirical measures, but they often either lack formal grounding or are ineffective. To address these shortcomings, we introduce a framework grounded in hyperspace geometry and cast token pruning as a Voronoi cell estimation problem in the embedding space. By interpreting each token's influence as a measure of its Voronoi region, our approach enables principled pruning that retains retrieval quality while reducing index size. Through our experiments, we demonstrate that this approach serves not only as a competitive pruning strategy but also as a valuable tool for improving and interpreting token-level behavior within dense retrieval systems.

Inference & Quantization Natural Language Processing Recommendation & Information Retrieval

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

A Voronoi Cell Formulation for Principled Token Pruning in Late-Interaction Retrieval Models

Related Papers