Search papers, labs, and topics across Lattice.
SequeL, INRIA Lille -Nord
3
0
3
Learning in multi-armed bandits gets a boost: even with only probabilistic side observations of other arms' losses, near-optimal regret is achievable without knowing the observation probability.
Learning user preferences for thousands of items can be achieved with just a handful of evaluations, thanks to a novel approach that leverages effective dimension in graph-based bandit problems.
Learning from noisy feedback doesn't have to be a guessing game: this new algorithm achieves near-optimal regret in online learning without needing to estimate the quality of the feedback.