UIUCJun 18, 2026arXiv:2606.20022

Stochastic Linear Contextual Bandits with Bounded Noise: A Set-Membership Approach

AI Summary

This paper introduces a novel algorithm, SME-OFU, for stochastic linear contextual bandits (SLCB) that leverages bounded reward noise through set-membership estimation (SME) and the principle of optimism in the face of uncertainty (OFU). By utilizing the stronger condition of bounded noise, the algorithm achieves a significantly improved regret bound of $O(\log T)$, contrasting with the traditional $\tilde{O}(\sqrt{T})$ bound associated with sub-Gaussian noise. Empirical results demonstrate that SME-OFU outperforms existing algorithms tailored for sub-Gaussian noise in scenarios with bounded reward noise, highlighting its practical advantages in real-world applications.

Key Contribution

Bounded reward noise can lead to a dramatic reduction in regret bounds, achieving $O(\log T)$ versus the standard $\tilde{O}(\sqrt{T})$ for sub-Gaussian conditions.

Abstract

This paper considers stochastic linear contextual bandits (SLCB) with bounded reward noise. Existing works typically assume sub-Gaussian reward noise and bounded expected rewards, under which the optimal regret bound scales as $\tilde{O}(\sqrt{T})$ in terms of horizon $T$. However, in many applications, realized/observed rewards are also naturally bounded, implying bounded reward noise. Bounded noise is more informative than the sub-Gaussian condition but has not been leveraged explicitly in the SLCB literature. In this paper, we propose a novel algorithm SME-OFU by utilizing an uncertainty quantification method called set-membership estimation (SME) and applying the principle of optimism in the face of uncertainty (OFU). Our algorithm enjoys an improved regret bound $O(\log T)$. Notice that this does not contradict the existing optimal bound $\tilde{O}(\sqrt{T})$ for sub-Gaussian noise because bounded noise is a stronger condition. Finally, simulations show empirical improvements of SME-OFU over a benchmark algorithm designed for sub-Gaussian noise when the reward noise is bounded.

Recommendation & Information Retrieval Scalable Oversight & Alignment Theory Training Efficiency & Optimization

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Stochastic Linear Contextual Bandits with Bounded Noise: A Set-Membership Approach

Related Papers