Korea USamsung Mobile eXperience BusinessApr 6, 2026arXiv:2604.04734

Beyond Hard Negatives: The Importance of Score Distribution in Knowledge Distillation for Dense Retrieval

Youngjoon Jang, Youngjoon Jang, Seongtae Hong, Hyeonseok Moon, Hyeonseok Moon, Heuiseok Lim, Heuiseok Lim

AI Summary

This paper investigates knowledge distillation (KD) for dense retrieval, arguing that current methods overly emphasize hard negatives at the expense of capturing the full teacher score distribution. They propose Stratified Sampling, a technique that uniformly samples across the teacher's score spectrum to better preserve the variance and entropy of the teacher's output. Experiments demonstrate that Stratified Sampling significantly outperforms hard negative mining and random sampling on both in-domain and out-of-domain benchmarks, highlighting the importance of emulating the teacher's comprehensive preference structure.

Key Contribution

Forget just mining hard negatives: the secret to better knowledge distillation for retrieval lies in matching the *entire* score distribution of your teacher model.

Abstract

Transferring knowledge from a cross-encoder teacher via Knowledge Distillation (KD) has become a standard paradigm for training retrieval models. While existing studies have largely focused on mining hard negatives to improve discrimination, the systematic composition of training data and the resulting teacher score distribution have received relatively less attention. In this work, we highlight that focusing solely on hard negatives prevents the student from learning the comprehensive preference structure of the teacher, potentially hampering generalization. To effectively emulate the teacher score distribution, we propose a Stratified Sampling strategy that uniformly covers the entire score spectrum. Experiments on in-domain and out-of-domain benchmarks confirm that Stratified Sampling, which preserves the variance and entropy of teacher scores, serves as a robust baseline, significantly outperforming top-K and random sampling in diverse settings. These findings suggest that the essence of distillation lies in preserving the diverse range of relative scores perceived by the teacher.

Inference & Quantization Recommendation & Information Retrieval Training Efficiency & Optimization

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Beyond Hard Negatives: The Importance of Score Distribution in Knowledge Distillation for Dense Retrieval

Related Papers