Search papers, labs, and topics across Lattice.
The paper introduces DiffusionRank, a novel generative learning-to-rank (LTR) approach based on denoising diffusion that models the joint distribution of feature vectors and relevance labels. This contrasts with traditional discriminative LTR methods that model the conditional probability of relevance given features. By learning the full data distribution, DiffusionRank aims to produce more robust ranking models, achieving significant improvements over discriminative counterparts.
Denoising diffusion models can significantly outperform discriminative methods in learning-to-rank, suggesting a new path for improving information retrieval.
In information retrieval (IR), learning-to-rank (LTR) methods have traditionally limited themselves to discriminative machine learning approaches that model the probability of the document being relevant to the query given some feature representation of the query-document pair. In this work, we propose an alternative denoising diffusion-based deep generative approach to LTR that instead models the full joint distribution over feature vectors and relevance labels. While in the discriminative setting, an over-parameterized ranking model may find different ways to fit the training data, we hypothesize that candidate solutions that can explain the full data distribution under the generative setting produce more robust ranking models. With this motivation, we propose DiffusionRank that extends TabDiff, an existing denoising diffusion-based generative model for tabular datasets, to create generative equivalents of classical discriminative pointwise and pairwise LTR objectives. Our empirical results demonstrate significant improvements from DiffusionRank models over their discriminative counterparts. Our work points to a rich space for future research exploration on how we can leverage ongoing advancements in deep generative modeling approaches, such as diffusion, for learning-to-rank in IR.