Search papers, labs, and topics across Lattice.
The paper introduces Activation- and Influence-Aware Ranks (AIR), a novel SVD-based compression framework for large language models (LLMs) that optimizes low-rank approximations using a backward-signal influence metric. This method achieves significant improvements in perplexity, surpassing existing techniques like ACIP while maintaining model quality with drastically reduced calibration data. Notably, AIR demonstrates over 18% better perplexity at 60% parameter retention, translating to substantial gains in computational efficiency across FLOP, peak memory, and per-token latency.
AIR achieves over 18% better perplexity than previous methods while retaining 60% of the parameters, revolutionizing LLM compression efficiency.
We present Activation- and Influence-Aware Ranks (AIR), an SVD-based LLM compression framework that guides each weight matrix's low-rank approximation with a backward-signal influence metric. Starting from the activation-aware optimum of SVD-LLM(W), AIR runs a single closed-form alternating least squares (ALS) sweep that integrates influence element-wise under a monotone-descent guarantee. AIR is layer-local and composes orthogonally with end-to-end methods: alone it exceeds ACIP, and AIR+LoRA outperforms it further. AIR improves perplexity over SVD-LLM(W) by >18% at <=60% parameter retention, matches its quality with ~90% less calibration data, and turns parameter savings into FLOP, peak-memory, and per-token latency gains.