Search papers, labs, and topics across Lattice.
This paper introduces a novel algorithm, SME-OFU, for stochastic linear contextual bandits (SLCB) that leverages bounded reward noise through set-membership estimation (SME) and the principle of optimism in the face of uncertainty (OFU). By utilizing the stronger condition of bounded noise, the algorithm achieves a significantly improved regret bound of \(O(\log T)\), contrasting with the traditional \(\tilde{O}(\sqrt{T})\) bound associated with sub-Gaussian noise. Empirical results demonstrate that SME-OFU outperforms existing algorithms tailored for sub-Gaussian noise in scenarios with bounded reward noise, highlighting its practical advantages in real-world applications.
Bounded reward noise can lead to a dramatic reduction in regret bounds, achieving \(O(\log T)\) versus the standard \(\tilde{O}(\sqrt{T})\) for sub-Gaussian conditions.
This paper considers stochastic linear contextual bandits (SLCB) with bounded reward noise. Existing works typically assume sub-Gaussian reward noise and bounded expected rewards, under which the optimal regret bound scales as $\tilde{O}(\sqrt{T})$ in terms of horizon $T$. However, in many applications, realized/observed rewards are also naturally bounded, implying bounded reward noise. Bounded noise is more informative than the sub-Gaussian condition but has not been leveraged explicitly in the SLCB literature. In this paper, we propose a novel algorithm SME-OFU by utilizing an uncertainty quantification method called set-membership estimation (SME) and applying the principle of optimism in the face of uncertainty (OFU). Our algorithm enjoys an improved regret bound $O(\log T)$. Notice that this does not contradict the existing optimal bound $\tilde{O}(\sqrt{T})$ for sub-Gaussian noise because bounded noise is a stronger condition. Finally, simulations show empirical improvements of SME-OFU over a benchmark algorithm designed for sub-Gaussian noise when the reward noise is bounded.