Search papers, labs, and topics across Lattice.
This paper investigates predicting the performance degradation of LLMs after low-rank compression using spectral statistics of the weight matrices. They find that the interaction between compression ratio and stable rank ($γ\cdot \barρ_s$) is a strong predictor of accuracy degradation across various compression methods and model families (Qwen3, Gemma3). The authors provide theoretical justification linking this predictor to SVD truncation bounds and error propagation in transformers.
Forget expensive compression trials – a simple spectral statistic can accurately predict how much your LLM will degrade *before* you even compress it.
Matrix-level low-rank compression is a promising way to reduce the cost of large language models, but running compression and evaluating the resulting models on language tasks can be prohibitively expensive. Can compression-induced degradation be predicted before committing to this compute? We systematically analyze the Qwen3 and Gemma3 model families across four representative low-rank compression methods: vanilla SVD, two ASVD variants, and SVD-LLM. We find that stable rank and information density, measured in bits per parameter, dominate performance degradation. The interaction term $γ\cdot \barρ_s$, defined as compression ratio times stable rank, is a robust predictor of accuracy degradation, achieving leave-one-out cross-validation Pearson correlations of $0.890$ for attention layers and $0.839$ for MLP layers. We provide theoretical intuition for why this predictor succeeds by connecting it to standard SVD truncation bounds and error composition mechanisms in transformer layers. These findings enable a predict-then-compress workflow: compute $γ\cdot \barρ_s$ from weights, estimate degradation, and invest compute only in desirable configurations.