Search papers, labs, and topics across Lattice.
This paper introduces a tropical geometry framework to analyze the expressivity of Transformers, modeling self-attention as a vector-valued tropical rational map. They prove that self-attention computes a Power Voronoi Diagram in the zero-temperature limit and show that multi-head attention expands polyhedral complexity to O(N^H), overcoming the O(N) bottleneck of single heads. The work derives tight asymptotic bounds on the number of linear regions in transformers, showing a combinatorial explosion of 螛(N^(d_model*L)) and demonstrating that finite-temperature soft attention preserves these topological partitions.
Transformers' expressivity explodes combinatorially with sequence length, embedding dimension, and depth, reaching 螛(N^(d_model*L)) linear regions.
To quantify the geometric expressivity of transformers, we introduce a tropical geometry framework to characterize their exact spatial partitioning capabilities. By modeling self-attention as a vector-valued tropical rational map, we prove it evaluates exactly to a Power Voronoi Diagram in the zero-temperature limit. Building on this equivalence, we establish a combinatorial rationale for Multi-Head Self-Attention (MHSA): via the Minkowski sum of Newton polytopes, multi-head aggregation expands the polyhedral complexity to $\mathcal{O}(N^H)$, overcoming the $\mathcal{O}(N)$ bottleneck of single heads. Extending this to deep architectures, we derive the first tight asymptotic bounds on the number of linear regions in transformers ($\Theta(N^{d_{\text{model}}L})$), demonstrating a combinatorial explosion driven intrinsically by sequence length $N$, ambient embedding dimension $d_{\text{model}}$, and network depth $L$. Importantly, we guarantee that this idealized polyhedral skeleton is geometrically stable: finite-temperature soft attention preserves these topological partitions via exponentially tight differential approximation bounds.