Feb 18, 2026arXiv:2602.16609

ColBERT-Zero: To Pre-train Or Not To Pre-train ColBERT models

Antoine Chaffin, Antoine Chaffin, Luca Arnaboldi, Luca Arnaboldi, Amélie Chatelain, Am'elie Chatelain, Florent Krzakala, Florent Krzakala

AI Summary

The paper investigates the impact of large-scale pre-training on multi-vector models like ColBERT, demonstrating that pre-training ColBERT from scratch (ColBERT-Zero) on public data outperforms knowledge distillation-based approaches that rely on strong single-vector models pre-trained on proprietary data. They find that a supervised step before knowledge distillation can improve performance and that aligning pre-training and fine-tuning setups is crucial. The resulting ColBERT-Zero model achieves state-of-the-art performance for its size, surpassing GTE-ModernColBERT and GTE-ModernBERT.

Key Contribution

Forget knowledge distillation: pre-training ColBERT from scratch on public data alone beats models distilled from stronger, closed-source single-vector baselines.

Abstract

Current state-of-the-art multi-vector models are obtained through a small Knowledge Distillation (KD) training step on top of strong single-vector models, leveraging the large-scale pre-training of these models. In this paper, we study the pre-training of multi-vector models and show that large-scale multi-vector pre-training yields much stronger multi-vector models. Notably, a fully ColBERT-pre-trained model, ColBERT-Zero, trained only on public data, outperforms GTE-ModernColBERT as well as its base model, GTE-ModernBERT, which leverages closed and much stronger data, setting new state-of-the-art for model this size. We also find that, although performing only a small KD step is not enough to achieve results close to full pre-training, adding a supervised step beforehand allows to achieve much closer performance while skipping the most costly unsupervised phase. Finally, we find that aligning the fine-tuning and pre-training setups is crucial when repurposing existing models. To enable exploration of our results, we release various checkpoints as well as code used to train them.

Natural Language Processing Recommendation & Information Retrieval Training Efficiency & Optimization

Citation Metrics

Citations0

Influential citations0

References14

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

ColBERT-Zero: To Pre-train Or Not To Pre-train ColBERT models

Related Papers