Search papers, labs, and topics across Lattice.
This paper studies the prophet inequality problem where rewards are observed through noisy realizations and reward distributions are unknown, a more realistic setting than previously studied. The authors propose algorithms based on lower-confidence-bound (LCB) thresholding to integrate learning and decision-making. They achieve a competitive ratio of $1 - 1/e$ in the i.i.d. setting and $1/2$ in the non-i.i.d. setting, demonstrating the feasibility of near-optimal online decision-making with noisy reward observations.
Even with noisy reward observations and unknown reward distributions, near-optimal online decision-making is possible using LCB thresholding, achieving competitive ratios of $1 - 1/e$ and $1/2$ in i.i.d. and non-i.i.d. settings, respectively.
We study the prophet inequality, a fundamental problem in online decision-making and optimal stopping, in a practical setting where rewards are observed only through noisy realizations and reward distributions are unknown. At each stage, the decision-maker receives a noisy reward whose true value follows a linear model with an unknown latent parameter, and observes a feature vector drawn from a distribution. To address this challenge, we propose algorithms that integrate learning and decision-making via lower-confidence-bound (LCB) thresholding. In the i.i.d.\ setting, we establish that both an Explore-then-Decide strategy and an $\varepsilon$-Greedy variant achieve the sharp competitive ratio of $1 - 1/e$, under a mild condition on the optimal value. For non-identical distributions, we show that a competitive ratio of $1/2$ can be guaranteed against a relaxed benchmark. Moreover, with limited window access to past rewards, the tight ratio of $1/2$ against the optimal benchmark is achieved.