UHApr 16, 2026arXiv:2604.15167

When Flat Minima Fail: Characterizing INT4 Quantization Collapse After FP32 Convergence

AI Summary

The paper investigates the failure of post-training INT4 quantization in language models after FP32 convergence, identifying a three-phase divergence structure where INT4 quantization error explodes while FP32 perplexity remains stable. Through analysis of Pythia-160m checkpoints, the authors pinpoint the onset of divergence to post-convergence weight updates rather than learning rate decay, and rule out weight outlier accumulation as the cause. They further demonstrate that oscillatory learning rate schedules can mitigate this divergence, with carefully calibrated amplitude being crucial for success.

Key Contribution

Even after a model appears fully trained in FP32, INT4 quantization can catastrophically degrade, revealing a hidden vulnerability to post-convergence weight updates.

Abstract

Post-training quantization (PTQ) assumes that a well-converged model is a quantization-ready model. We show this assumption fails in a structured, measurable, and previously uncharacterized way. Using a calibration-free per-group INT4 probe applied to all 154 publicly available Pythia-160m training checkpoints, we identify a three-phase divergence structure: a rapid-learning phase where both FP32 perplexity and quantization robustness improve together, a meta-stable plateau lasting roughly 70,000 steps where FP32 perplexity stagnates but INT4 gap remains bounded, and an explosive divergence phase where the INT4 gap compounds from 11% to 517% while FP32 perplexity barely moves. Critically, this divergence begins not when the learning rate starts decaying, but precisely when FP32 perplexity converges a finer-grained onset predictor that implies post-convergence weight updates, rather than decay magnitude alone, are the proximate cause. We further show that INT8 quantization is entirely immune throughout all three phases, constraining the mechanism to the coarseness of the 16-level INT4 grid specifically, and rule out weight outlier accumulation as the mechanism via direct kurtosis measurement. Finally, we conduct a controlled fork experiment from the pre-divergence checkpoint comparing three learning rate schedules (cosine continuation, SGDR warm restarts, and our proposed Oscillatory Lock-In) across nine independent runs. SGDR uniformly accelerates divergence (0/9 pairwise wins against cosine), while OLI's settled cool phases reduce the INT4 gap by 2.2 percentage points on average (t = -5.46, p<0.0001), demonstrating that schedule amplitude calibration, not oscillation alone, determines whether perturbation helps or hurts. Our code, probe implementation, and all 154-checkpoint audit results are released publicly.

Inference & Quantization Open-Source Models & Weights Training Efficiency & Optimization

Citation Metrics

Citations0

Influential citations0

References17

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

When Flat Minima Fail: Characterizing INT4 Quantization Collapse After FP32 Convergence

Related Papers