May 6, 2026arXiv:2605.04893

Self-Attention as Transport: Limits of Symmetric Spectral Diagnostics

Dominik Dahlem, Diego Maniloff, Mac Misiura

AI Summary

This paper analyzes the limitations of symmetric spectral methods for diagnosing attention failures in large language models, proving that transpose-invariant spectral diagnostics are inherently orientation-blind and cannot detect information flow direction. They introduce the asymmetry coefficient $G$ as the key parameter for directionality and derive a bipartite-Cheeger landscape for causal architectures, showing distinct failure mode shapes for uniform causal and window attention. Empirical validation on models up to 8B parameters demonstrates that transport features retain interpretable signal and exhibit predicted polarity reversals between hallucination benchmarks.

Key Contribution

Symmetric spectral analysis of attention is fundamentally blind to information flow direction, but a simple asymmetry coefficient can restore the signal.

Abstract

Large language models hallucinate in predictable ways: attention routing fails by over-concentrating on a narrow set of positions, or by spreading so diffusely that relevance is diluted, and the shape of the failure carries diagnostic signal. A widely used family of spectral methods analyzes the symmetric component of the degree-normalized attention operator, which governs transport capacity; we prove that every transpose-invariant spectral diagnostic of this operator is structurally orientation-blind (it cannot distinguish an operator from its transpose, and therefore cannot detect information-flow direction), with a quantitative converse establishing the asymmetry coefficient $G$ as the unique control parameter for direction. Pairing this with a closed-form bipartite-Cheeger landscape for canonical causal architectures, we show that uniform causal attention satisfies an $n$-independent floor $φ\ge 1/5$ with worst cut at $t^\ast/n \approx 0.32$, while window attention pierces the floor as $O(w/n)$; failure modes are shape-different, not just value-different. The resulting two-axis diagnostic ($φ$ for capacity, $G$ for direction) yields a falsifiable polarity prediction: bottleneck- and diffuse-dominated benchmarks should exhibit opposite polarity. Under length-controlled evaluation, transport features retain interpretable signal (LC-AUROC from 0.62 to 0.84) on tested models up to 8B parameters, with polarity reversing as predicted between HaluEval and MedHallu.

Architecture Design (Transformers, SSMs, MoE)Interpretability & Mechanistic Interp Natural Language Processing

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Self-Attention as Transport: Limits of Symmetric Spectral Diagnostics

Related Papers