Feb 17, 2026arXiv:2602.15503

Approximation Theory for Lipschitz Continuous Transformers

Takashi Furuya, Davide Murari, Carola-Bibiane Schönlieb

AI Summary

This paper introduces a class of in-context Transformers that are Lipschitz-continuous by construction, realizing MLP and attention blocks as explicit Euler steps of negative gradient flows. The work addresses the lack of approximation-theoretic guarantees for Lipschitz-constrained Transformers, which are important for stability and robustness. The authors prove a universal approximation theorem for this class within a Lipschitz-constrained function space, using a measure-theoretic formalism to achieve token-count independence in the approximation guarantees.

Key Contribution

Lipschitz-constrained Transformers, built from gradient flows, can provably approximate any Lipschitz-continuous function, offering a path to more robust and stable architectures.

Abstract

Stability and robustness are critical for deploying Transformers in safety-sensitive settings. A principled way to enforce such behavior is to constrain the model's Lipschitz constant. However, approximation-theoretic guarantees for architectures that explicitly preserve Lipschitz continuity have yet to be established. In this work, we bridge this gap by introducing a class of gradient-descent-type in-context Transformers that are Lipschitz-continuous by construction. We realize both MLP and attention blocks as explicit Euler steps of negative gradient flows, ensuring inherent stability without sacrificing expressivity. We prove a universal approximation theorem for this class within a Lipschitz-constrained function space. Crucially, our analysis adopts a measure-theoretic formalism, interpreting Transformers as operators on probability measures, to yield approximation guarantees independent of token count. These results provide a rigorous theoretical foundation for the design of robust, Lipschitz continuous Transformer architectures.

Architecture Design (Transformers, SSMs, MoE)Red-Teaming & Adversarial Robustness Training Efficiency & Optimization

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Approximation Theory for Lipschitz Continuous Transformers

Related Papers