Apr 6, 2026arXiv:2604.04469

Same Geometry, Opposite Noise: Transformer Magnitude Representations Lack Scalar Variability

AI Summary

This paper investigates whether Transformer language models exhibit scalar variability, a property where representational noise scales proportionally with magnitude, as observed in biological systems. Analyzing hidden-state representations of numerical magnitudes in Llama-3-8B and Mistral-7B models, the authors found that representational variability *decreases* with magnitude, contrary to scalar variability. This suggests that distributional learning in Transformers captures magnitude geometry but fails to reproduce the noise characteristics of biological magnitude systems.

Key Contribution

Transformers get the magnitude geometry right, but completely botch the noise: unlike brains, their representations become *less* variable for larger numbers.

Abstract

Scalar variability -- the finding that representational noise scales proportionally with magnitude, producing a constant coefficient of variation -- is a hallmark of biological magnitude systems. We tested whether transformer language models exhibit this property by analysing the dispersion of hidden-state representations across carrier sentences for 26 numerical magnitudes in three 7-8B parameter models (Llama-3-8B-Instruct, Mistral-7B-Instruct-v0.3, Llama-3-8B-Base; data from Cacioli, 2026). We found the opposite: representational variability decreased with magnitude along the magnitude axis (scaling exponent alpha approx -0.19; 0/16 primary layers with alpha > 0, all three models). The negative sign was consistent in full-dimensional space (alpha approx -0.04) and after sentence-identity correction (alpha approx -0.007). The anti-scalar pattern was 3-5x stronger along the magnitude axis than orthogonal dimensions, and corpus frequency strongly predicted per-magnitude variability (rho = .84). These results demonstrate that distributional learning alone is insufficient to produce scalar variability: transformers reproduce log-compressive magnitude geometry but not the constant-CV noise signature observed in biological systems.

Architecture Design (Transformers, SSMs, MoE)Natural Language Processing Open-Source Models & Weights

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Same Geometry, Opposite Noise: Transformer Magnitude Representations Lack Scalar Variability

Related Papers