Search papers, labs, and topics across Lattice.
This paper compares the fairness of reasoning-based rerankers (like Rank1) against non-reasoning rerankers using the TREC 2022 Fair Ranking Track dataset. They evaluate six reranking models across various retrieval settings and demographic attributes, using Attention-Weighted Rank Fairness (AWRF) as the primary fairness metric. The key finding is that reasoning rerankers neither improve nor harm fairness compared to non-reasoning approaches, with AWRF scores remaining stable despite substantial variations in relevance.
Reasoning rerankers don't magically fix fairness issues in search, preserving the biases of their input rankings despite boosting relevance.
While reasoning rerankers, such as Rank1, have demonstrated strong abilities in improving ranking relevance, it is unclear how they perform on other retrieval qualities such as fairness. We conduct the first systematic comparison of fairness between reasoning and non-reasoning rerankers. Using the TREC 2022 Fair Ranking Track dataset, we evaluate six reranking models across multiple retrieval settings and demographic attributes. Our findings demonstrate reasoning neither improve nor harm fairness compared to non-reasoning approaches. Our fairness metric, Attention-Weighted Rank Fairness (AWRF) remained stable (0.33-0.35) across all models, even as relevance varies substantially (nDCG 0.247-1.000). Demographic breakdown analysis revealed fairness gaps for geographic attributes regardless of model architecture. These results indicate that future work in specializing reasoning models to be aware of fairness attributes could lead to improvements, as current implementations preserve the fairness characteristics of their input ranking.