aampeIndependent ResearcherFeb 16, 2026arXiv:2602.14914

Additive Control Variates Dominate Self-Normalisation in Off-Policy Evaluation

AI Summary

This paper theoretically compares Self-Normalised Inverse Propensity Scoring (SNIPS) with additive control variates for off-policy evaluation (OPE), proving that an estimator with an optimal additive baseline, $β^\star$-IPS, asymptotically dominates SNIPS in Mean Squared Error. The authors analytically decompose the variance gap between SNIPS and $β^\star$-IPS, demonstrating that SNIPS is asymptotically equivalent to using a specific, but generally suboptimal, additive baseline. These findings provide a theoretical basis for preferring optimal baseline corrections over self-normalization in ranking and recommendation systems.

Key Contribution

Forget SNIPS: optimal additive baselines provably crush self-normalization for off-policy evaluation.

Abstract

Off-policy evaluation (OPE) is essential for assessing ranking and recommendation systems without costly online interventions. Self-Normalised Inverse Propensity Scoring (SNIPS) is a standard tool for variance reduction in OPE, leveraging a multiplicative control variate. Recent advances in off-policy learning suggest that additive control variates (baseline corrections) may offer superior performance, yet theoretical guarantees for evaluation are lacking. This paper provides a definitive answer: we prove that $β^\star$-IPS, an estimator with an optimal additive baseline, asymptotically dominates SNIPS in Mean Squared Error. By analytically decomposing the variance gap, we show that SNIPS is asymptotically equivalent to using a specific -- but generally sub-optimal -- additive baseline. Our results theoretically justify shifting from self-normalisation to optimal baseline corrections for both ranking and recommendation.

Recommendation & Information Retrieval

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Additive Control Variates Dominate Self-Normalisation in Off-Policy Evaluation

Related Papers