Feb 16, 2026arXiv:2602.15200

COMPOT: Calibration-Optimized Matrix Procrustes Orthogonalization for Transformers Compression

Denis Makhov, Denis Makhov, Dmitriy Shopkhoev, Dmitriy Shopkhoev, Magauiya Zhussip, Magauiya Zhussip, Ammar Ali, Ammar Ali, Baher Mohammad, Baher Mohammad, Stamatios Lefkimmiatis, Stamatios Lefkimmiatis

AI Summary

The paper introduces COMPOT, a training-free post-training compression method for Transformers that leverages sparse dictionary learning with orthogonal dictionaries to represent weights as a union-of-subspaces. COMPOT avoids iterative optimization by using closed-form Procrustes updates for the dictionary and analytical single-step sparse coding for the coefficients, enabled by the orthogonal dictionaries. The method also incorporates a one-shot dynamic allocation strategy to redistribute layer-wise compression rates based on layer sensitivity, achieving a superior quality-compression trade-off compared to low-rank and sparse baselines.

Key Contribution

Ditch the SVD and iterative sparse coding – COMPOT uses calibration data and orthogonal dictionaries for a training-free Transformer compression that achieves superior quality-compression trade-offs.

Abstract

Post-training compression of Transformer models commonly relies on truncated singular value decomposition (SVD). However, enforcing a single shared subspace can degrade accuracy even at moderate compression. Sparse dictionary learning provides a more flexible union-of-subspaces representation, but existing approaches often suffer from iterative dictionary and coefficient updates. We propose COMPOT (Calibration-Optimized Matrix Procrustes Orthogonalization for Transformers), a training-free compression framework that uses a small calibration dataset to estimate a sparse weight factorization. COMPOT employs orthogonal dictionaries that enable closed-form Procrustes updates for the dictionary and analytical single-step sparse coding for the coefficients, eliminating iterative optimization. To handle heterogeneous layer sensitivity under a global compression budget, COMPOT further introduces a one-shot dynamic allocation strategy that adaptively redistributes layer-wise compression rates. Extensive experiments across diverse architectures and tasks show that COMPOT consistently delivers a superior quality-compression trade-off over strong low-rank and sparse baselines, while remaining fully compatible with post-training quantization for extreme compression. Code is available $\href{https://github.com/mts-ai/COMPOT}{here}$.

Architecture Design (Transformers, SSMs, MoE)Inference & Quantization Training Efficiency & Optimization

Citation Metrics

Citations0

Influential citations0

References46

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

COMPOT: Calibration-Optimized Matrix Procrustes Orthogonalization for Transformers Compression

Related Papers