DAMOMar 5, 2026arXiv:2603.04800

MASQuant: Modality-Aware Smoothing Quantization for Multimodal Large Language Models

Lulu Hu, Wenhua Xiao, Xin Chen, Xin Xu, Bowen Xu, Kunrao Li, Yongliang Tao

AI Summary

This paper introduces MASQuant, a post-training quantization framework tailored for Multimodal Large Language Models (MLLMs). It tackles smoothing misalignment by learning modality-specific smoothing factors and addresses cross-modal computational invariance via SVD whitening to enable unified quantization. Experiments on dual-modal and tri-modal MLLMs demonstrate MASQuant's competitive performance compared to state-of-the-art PTQ algorithms.

Key Contribution

Achieve stable and competitive quantization for multimodal LLMs by explicitly accounting for modality-specific characteristics and cross-modal computational differences.

Abstract

Post-training quantization (PTQ) with computational invariance for Large Language Models~(LLMs) have demonstrated remarkable advances, however, their application to Multimodal Large Language Models~(MLLMs) presents substantial challenges. In this paper, we analyze SmoothQuant as a case study and identify two critical issues: Smoothing Misalignment and Cross-Modal Computational Invariance. To address these issues, we propose Modality-Aware Smoothing Quantization (MASQuant), a novel framework that introduces (1) Modality-Aware Smoothing (MAS), which learns separate, modality-specific smoothing factors to prevent Smoothing Misalignment, and (2) Cross-Modal Compensation (CMC), which addresses Cross-modal Computational Invariance by using SVD whitening to transform multi-modal activation differences into low-rank forms, enabling unified quantization across modalities. MASQuant demonstrates stable quantization performance across both dual-modal and tri-modal MLLMs. Experimental results show that MASQuant is competitive among the state-of-the-art PTQ algorithms. Source code: https://github.com/alibaba/EfficientAI.

Inference & Quantization Multimodal Models

Citation Metrics

Citations0

Influential citations0

References44

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

MASQuant: Modality-Aware Smoothing Quantization for Multimodal Large Language Models

Related Papers