BAIRMar 9, 2026arXiv:2603.08371

Leaderboard Incentives: Model Rankings under Strategic Post-Training

Yatong Chen, Guanhua Zhang, Moritz Hardt

AI Summary

This paper formalizes benchmark-driven model development as a Stackelberg game between a benchmark designer and multiple model developers who strategically allocate resources to improve leaderboard scores. The authors prove that commonly used benchmarks can lead to games with no Nash equilibrium, explaining misaligned incentives and opaque strategizing. They then show that the "tune-before-test" protocol can induce a benchmark with a unique Nash equilibrium that accurately ranks models by latent quality.

Key Contribution

Current ML benchmarks may be ungameable in theory, as they can lack a stable equilibrium where developers are incentivized to improve true model quality rather than just leaderboard scores.

Abstract

Influential benchmarks incentivize competing model developers to strategically allocate post-training resources toward improvements on the leaderboard, a phenomenon dubbed benchmaxxing or training on the test task. In this work, we initiate a principled study of the incentive structure that benchmarks induce. We model benchmarking as a Stackelberg game between a benchmark designer who chooses an evaluation protocol and multiple model developers who compete simultaneously in a subgame given by the designer's choice. Each competitor has a model of unknown latent quality and can inflate its observed score by allocating resources to benchmark-specific improvements. First, we prove that current benchmarks induce games for which no Nash equilibrium between model developers exists. This result suggests one explanation for why current practice leads to misaligned incentives, prompting model developers to strategize in opaque ways. However, we prove that under mild conditions, a recently proposed evaluation protocol, called tune-before-test, induces a benchmark with a unique Nash equilibrium that ranks models by latent quality. This positive result demonstrates that benchmarks need not set bad incentives, even if current evaluations do.

Constitutional AI & AI Ethics Eval Frameworks & Benchmarks

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Leaderboard Incentives: Model Rankings under Strategic Post-Training

Related Papers