UPennFeb 24, 2026arXiv:2602.20465

Prior-Agnostic Incentive-Compatible Exploration

Ramya Ramalingam, Osbert Bastani, Aaron Roth

AI Summary

This paper addresses the incentive misalignment problem in bandit settings where a principal recommends actions to agents who may not value exploration. The authors demonstrate that weighted swap regret bounds are sufficient to ensure agents faithfully follow recommendations in an approximate Bayes Nash equilibrium, even with dynamic environments, conflicting prior beliefs, and no prior knowledge of agent beliefs by the mechanism designer. They achieve this by assuming agents have uncertainty about their arrival time and provide concrete algorithms for adaptive and weighted regret guarantees in bandit settings.

Key Contribution

Forget Bayesian priors: swap regret bounds alone can align incentives for exploration in dynamic bandit settings with diverse agent beliefs.

Abstract

In bandit settings, optimizing long-term regret metrics requires exploration, which corresponds to sometimes taking myopically sub-optimal actions. When a long-lived principal merely recommends actions to be executed by a sequence of different agents (as in an online recommendation platform) this provides an incentive misalignment: exploration is "worth it" for the principal but not for the agents. Prior work studies regret minimization under the constraint of Bayesian Incentive-Compatibility in a static stochastic setting with a fixed and common prior shared amongst the agents and the algorithm designer. We show that (weighted) swap regret bounds on their own suffice to cause agents to faithfully follow forecasts in an approximate Bayes Nash equilibrium, even in dynamic environments in which agents have conflicting prior beliefs and the mechanism designer has no knowledge of any agents beliefs. To obtain these bounds, it is necessary to assume that the agents have some degree of uncertainty not just about the rewards, but about their arrival time -- i.e. their relative position in the sequence of agents served by the algorithm. We instantiate our abstract bounds with concrete algorithms for guaranteeing adaptive and weighted regret in bandit settings.

Recommendation & Information Retrieval RLHF & Preference Learning

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Prior-Agnostic Incentive-Compatible Exploration

Related Papers