Feb 24, 2026arXiv:2602.20555

Standard Transformers Achieve the Minimax Rate in Nonparametric Regression with $C^{s,λ}$ Targets

AI Summary

This paper proves that standard Transformers can approximate Hölder functions $C^{s,λ}([0,1]^{d\times n})$ under the $L^t$ distance with arbitrary precision. Based on this approximation result, the authors demonstrate that standard Transformers achieve the minimax optimal rate in nonparametric regression for Hölder target functions. The work introduces the size tuple and dimension vector metrics to characterize Transformer structures, facilitating future research on generalization and optimization.

Key Contribution

Transformers are provably minimax optimal for nonparametric regression with Hölder target functions, offering a theoretical underpinning for their empirical success.

Abstract

The tremendous success of Transformer models in fields such as large language models and computer vision necessitates a rigorous theoretical investigation. To the best of our knowledge, this paper is the first work proving that standard Transformers can approximate Hölder functions $ C^{s,λ}\left([0,1]^{d\times n}\right) $$ (s\in\mathbb{N}_{\geq0},0<λ\leq1) $ under the $L^t$ distance ($t \in [1, \infty]$) with arbitrary precision. Building upon this approximation result, we demonstrate that standard Transformers achieve the minimax optimal rate in nonparametric regression for Hölder target functions. It is worth mentioning that, by introducing two metrics: the size tuple and the dimension vector, we provide a fine-grained characterization of Transformer structures, which facilitates future research on the generalization and optimization errors of Transformers with different structures. As intermediate results, we also derive the upper bounds for the Lipschitz constant of standard Transformers and their memorization capacity, which may be of independent interest. These findings provide theoretical justification for the powerful capabilities of Transformer models.

Architecture Design (Transformers, SSMs, MoE)Natural Language Processing

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Standard Transformers Achieve the Minimax Rate in Nonparametric Regression with $C^{s,λ}$ Targets

Related Papers