Search papers, labs, and topics across Lattice.
This paper investigates whether Transformer language models exhibit scalar variability, a property where representational noise scales proportionally with magnitude, as observed in biological systems. Analyzing hidden-state representations of numerical magnitudes in Llama-3-8B and Mistral-7B models, the authors found that representational variability *decreases* with magnitude, contrary to scalar variability. This suggests that distributional learning in Transformers captures magnitude geometry but fails to reproduce the noise characteristics of biological magnitude systems.
Transformers get the magnitude geometry right, but completely botch the noise: unlike brains, their representations become *less* variable for larger numbers.
Scalar variability -- the finding that representational noise scales proportionally with magnitude, producing a constant coefficient of variation -- is a hallmark of biological magnitude systems. We tested whether transformer language models exhibit this property by analysing the dispersion of hidden-state representations across carrier sentences for 26 numerical magnitudes in three 7-8B parameter models (Llama-3-8B-Instruct, Mistral-7B-Instruct-v0.3, Llama-3-8B-Base; data from Cacioli, 2026). We found the opposite: representational variability decreased with magnitude along the magnitude axis (scaling exponent alpha approx -0.19; 0/16 primary layers with alpha > 0, all three models). The negative sign was consistent in full-dimensional space (alpha approx -0.04) and after sentence-identity correction (alpha approx -0.007). The anti-scalar pattern was 3-5x stronger along the magnitude axis than orthogonal dimensions, and corpus frequency strongly predicted per-magnitude variability (rho = .84). These results demonstrate that distributional learning alone is insufficient to produce scalar variability: transformers reproduce log-compressive magnitude geometry but not the constant-CV noise signature observed in biological systems.