Apr 8, 2026arXiv:2604.06738

Beyond Pessimism: Offline Learning in KL-regularized Games

AI Summary

This paper introduces a pessimism-free algorithm for offline learning in KL-regularized two-player zero-sum games, leveraging the smoothness of KL-regularized best responses and the stability of the Nash equilibrium. By avoiding pessimistic value estimation, the algorithm achieves an improved $\widetilde{\mathcal{O}}(1/n)$ sample complexity bound, a significant improvement over the $\widetilde{\mathcal{O}}(1/\sqrt n)$ rate of prior pessimistic methods. The authors also present an efficient self-play policy optimization algorithm that attains the same optimal rate with a linear number of iterations.

Key Contribution

Ditching pessimism unlocks a quadratically faster, $\widetilde{\mathcal{O}}(1/n)$ sample complexity for offline learning in KL-regularized games.

Abstract

We study offline learning in KL-regularized two-player zero-sum games, where policies are optimized under a KL constraint to a fixed reference policy. Prior work relies on pessimistic value estimation to handle distribution shift, yielding only $\widetilde{\mathcal{O}}(1/\sqrt n)$ statistical rates. We develop a new pessimism-free algorithm and analytical framework for KL-regularized games, built on the smoothness of KL-regularized best responses and a stability property of the Nash equilibrium induced by skew symmetry. This yields the first $\widetilde{\mathcal{O}}(1/n)$ sample complexity bound for offline learning in KL-regularized zero-sum games, achieved entirely without pessimism. We further propose an efficient self-play policy optimization algorithm and prove that, with a number of iterations linear in the sample size, it achieves the same fast $\widetilde{\mathcal{O}}(1/n)$ statistical rate as the minimax estimator.

Training Efficiency & Optimization

Citation Metrics

Citations0

Influential citations0

References34

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Beyond Pessimism: Offline Learning in KL-regularized Games

Related Papers