Search papers, labs, and topics across Lattice.
This paper introduces MASQuant, a post-training quantization framework tailored for Multimodal Large Language Models (MLLMs). It tackles smoothing misalignment by learning modality-specific smoothing factors and addresses cross-modal computational invariance via SVD whitening to enable unified quantization. Experiments on dual-modal and tri-modal MLLMs demonstrate MASQuant's competitive performance compared to state-of-the-art PTQ algorithms.
Achieve stable and competitive quantization for multimodal LLMs by explicitly accounting for modality-specific characteristics and cross-modal computational differences.
Post-training quantization (PTQ) with computational invariance for Large Language Models~(LLMs) have demonstrated remarkable advances, however, their application to Multimodal Large Language Models~(MLLMs) presents substantial challenges. In this paper, we analyze SmoothQuant as a case study and identify two critical issues: Smoothing Misalignment and Cross-Modal Computational Invariance. To address these issues, we propose Modality-Aware Smoothing Quantization (MASQuant), a novel framework that introduces (1) Modality-Aware Smoothing (MAS), which learns separate, modality-specific smoothing factors to prevent Smoothing Misalignment, and (2) Cross-Modal Compensation (CMC), which addresses Cross-modal Computational Invariance by using SVD whitening to transform multi-modal activation differences into low-rank forms, enabling unified quantization across modalities. MASQuant demonstrates stable quantization performance across both dual-modal and tri-modal MLLMs. Experimental results show that MASQuant is competitive among the state-of-the-art PTQ algorithms. Source code: https://github.com/alibaba/EfficientAI.