Feb 18, 2026arXiv:2602.16476

Learning Preference from Observed Rankings

Yu-Chang Chen, Chen Chian Fuh, Shang En Tsai

AI Summary

This paper introduces a framework for learning individual consumer preferences from partial ranking data by modeling observed rankings as pairwise comparisons with logistic choice probabilities. The latent utility is modeled as a sum of interpretable product attributes, item fixed effects, and a low-rank user-item factor structure, while addressing exposure bias by modeling pair observability as the product of item-level observability propensities. The method estimates preference parameters by maximizing an inverse-probability-weighted (IPW), ridge-regularized log-likelihood and uses a stochastic gradient descent (SGD) algorithm based on inverse-probability resampling for scalability.

Key Contribution

Overcome exposure bias in recommendation systems by modeling item observability and inverse-probability weighting, leading to better predictions, especially for new products.

Abstract

Estimating consumer preferences is central to many problems in economics and marketing. This paper develops a flexible framework for learning individual preferences from partial ranking information by interpreting observed rankings as collections of pairwise comparisons with logistic choice probabilities. We model latent utility as the sum of interpretable product attributes, item fixed effects, and a low-rank user-item factor structure, enabling both interpretability and information sharing across consumers and items. We further correct for selection in which comparisons are observed: a comparison is recorded only if both items enter the consumer's consideration set, inducing exposure bias toward frequently encountered items. We model pair observability as the product of item-level observability propensities and estimate these propensities with a logistic model for the marginal probability that an item is observable. Preference parameters are then estimated by maximizing an inverse-probability-weighted (IPW), ridge-regularized log-likelihood that reweights observed comparisons toward a target comparison population. To scale computation, we propose a stochastic gradient descent (SGD) algorithm based on inverse-probability resampling, which draws comparisons in proportion to their IPW weights. In an application to transaction data from an online wine retailer, the method improves out-of-sample recommendation performance relative to a popularity-based benchmark, with particularly strong gains in predicting purchases of previously unconsumed products.

Recommendation & Information Retrieval RLHF & Preference Learning

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Learning Preference from Observed Rankings

Related Papers