ENSAE Paris -CRESTINRIAParis-SaclayApr 16, 2026arXiv:2604.15242

Optimal last-iterate convergence in matrix games with bandit feedback using the log-barrier

Côme Fiegel, Come Fiegel, Pierre Menard, Pierre Ménard, Tadashi Kozuno, Michal Valko, Vianney Perchet

AI Summary

This paper addresses the challenge of achieving optimal last-iterate convergence in zero-sum matrix games with bandit feedback, where players are uncoupled. They demonstrate that using a log-barrier regularization within an online mirror descent framework, combined with a dual-focused analysis, achieves an exploitability gap of O-tilde(t^{-1/4}) with high probability, improving upon existing algorithms. The approach is further extended to extensive-form games, achieving the same convergence rate.

Key Contribution

Log-barrier regularization unlocks optimal O-tilde(t^{-1/4}) last-iterate convergence in uncoupled matrix games with bandit feedback, finally closing the gap to the theoretical limit.

Abstract

We study the problem of learning minimax policies in zero-sum matrix games. Fiegel et al. (2025) recently showed that achieving last-iterate convergence in this setting is harder when the players are uncoupled, by proving a lower bound on the exploitability gap of Omega(t^{-1/4}). Some online mirror descent algorithms were proposed in the literature for this problem, but none have truly attained this rate yet. We show that the use of a log-barrier regularization, along with a dual-focused analysis, allows this O-tilde(t^{-1/4}) convergence with high-probability. We additionally extend our idea to the setting of extensive-form games, proving a bound with the same rate.

Natural Language Processing Training Efficiency & Optimization

Citation Metrics

Citations0

Influential citations0

References15

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Optimal last-iterate convergence in matrix games with bandit feedback using the log-barrier

Related Papers