CASApr 21, 2026arXiv:2604.19884

From Signal Degradation to Computation Collapse: Uncovering the Two Failure Modes of LLM Quantization

Chenxi Zhou, Pengfei Cao, Jianguo Li, Bohan Yu, Jinyu Ye, Kang Liu

AI Summary

This paper investigates the failure modes of extreme quantization in LLMs, identifying two distinct mechanisms: Signal Degradation, caused by cumulative error, and Computation Collapse, where key components cease functioning. Through mechanistic analysis, the authors show that Signal Degradation can be mitigated with training-free interventions, while Computation Collapse requires more substantial structural modifications. The study provides a diagnostic framework for PTQ failures, highlighting the limitations of simple compensation strategies for severe quantization.

Key Contribution

LLMs break in two fundamentally different ways when pushed to extreme quantization: either through gradual information loss or sudden functional breakdown of key components.

Abstract

Post-Training Quantization (PTQ) is critical for the efficient deployment of Large Language Models (LLMs). While 4-bit quantization is widely regarded as an optimal trade-off, reducing the precision to 2-bit usually triggers a catastrophic ``performance cliff.''It remains unclear whether the underlying mechanisms differ fundamentally. Consequently, we conduct a systematic mechanistic analysis, revealing two qualitatively distinct failure modes: Signal Degradation, where the computational patterns remain intact but information precision is impaired by cumulative error; and Computation Collapse, where key components fail to function, preventing correct information processing and destroying the signal in the early layers. Guided by this diagnosis, we conduct mechanism-aware interventions, demonstrating that targeted, training-free repair can mitigate Signal Degradation, but remains ineffective for Computation Collapse. Our findings provide a systematic diagnostic framework for PTQ failures and suggest that addressing Computation Collapse requires structural reconstruction rather than mere compensation.

Architecture Design (Transformers, SSMs, MoE)Inference & Quantization Training Efficiency & Optimization

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

From Signal Degradation to Computation Collapse: Uncovering the Two Failure Modes of LLM Quantization

Related Papers