HKUHuaweiApr 16, 2026arXiv:2604.14727

Expressivity of Transformers: A Tropical Geometry Perspective

AI Summary

This paper introduces a tropical geometry framework to analyze the expressivity of Transformers, modeling self-attention as a vector-valued tropical rational map. They prove that self-attention computes a Power Voronoi Diagram in the zero-temperature limit and show that multi-head attention expands polyhedral complexity to O(N^H), overcoming the O(N) bottleneck of single heads. The work derives tight asymptotic bounds on the number of linear regions in transformers, showing a combinatorial explosion of Θ(N^(d_model*L)) and demonstrating that finite-temperature soft attention preserves these topological partitions.

Key Contribution

Transformers' expressivity explodes combinatorially with sequence length, embedding dimension, and depth, reaching Θ(N^(d_model*L)) linear regions.

Abstract

To quantify the geometric expressivity of transformers, we introduce a tropical geometry framework to characterize their exact spatial partitioning capabilities. By modeling self-attention as a vector-valued tropical rational map, we prove it evaluates exactly to a Power Voronoi Diagram in the zero-temperature limit. Building on this equivalence, we establish a combinatorial rationale for Multi-Head Self-Attention (MHSA): via the Minkowski sum of Newton polytopes, multi-head aggregation expands the polyhedral complexity to $\mathcal{O}(N^H)$, overcoming the $\mathcal{O}(N)$ bottleneck of single heads. Extending this to deep architectures, we derive the first tight asymptotic bounds on the number of linear regions in transformers ($\Theta(N^{d_{\text{model}}L})$), demonstrating a combinatorial explosion driven intrinsically by sequence length $N$, ambient embedding dimension $d_{\text{model}}$, and network depth $L$. Importantly, we guarantee that this idealized polyhedral skeleton is geometrically stable: finite-temperature soft attention preserves these topological partitions via exponentially tight differential approximation bounds.

Architecture Design (Transformers, SSMs, MoE)Natural Language Processing

Citation Metrics

Citations0

Influential citations0

References44

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Expressivity of Transformers: A Tropical Geometry Perspective

Related Papers