School of Information ScienceSYSUMar 3, 2026arXiv:2603.03084

On the Expressive Power of Transformers for Maxout Networks and Continuous Piecewise Linear Functions

AI Summary

The paper investigates the expressive power of Transformer networks by establishing a connection to maxout networks and continuous piecewise linear functions. They show that Transformers can approximate maxout networks with comparable complexity, inheriting the universal approximation property of ReLU networks. Furthermore, they analyze the number of linear regions Transformers can represent, demonstrating exponential growth with depth and providing insights into the roles of self-attention and feedforward layers.

Key Contribution

Transformers, like ReLU networks, are universal approximators, and their expressive power for piecewise linear functions grows exponentially with depth.

Abstract

Transformer networks have achieved remarkable empirical success across a wide range of applications, yet their theoretical expressive power remains insufficiently understood. In this paper, we study the expressive capabilities of Transformer architectures. We first establish an explicit approximation of maxout networks by Transformer networks while preserving comparable model complexity. As a consequence, Transformers inherit the universal approximation capability of ReLU networks under similar complexity constraints. Building on this connection, we develop a framework to analyze the approximation of continuous piecewise linear functions by Transformers and quantitatively characterize their expressivity via the number of linear regions, which grows exponentially with depth. Our analysis establishes a theoretical bridge between approximation theory for standard feedforward neural networks and Transformer architectures. It also yields structural insights into Transformers: self-attention layers implement max-type operations, while feedforward layers realize token-wise affine transformations.

Architecture Design (Transformers, SSMs, MoE)Natural Language Processing

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

On the Expressive Power of Transformers for Maxout Networks and Continuous Piecewise Linear Functions

Related Papers