Mar 12, 2026arXiv:2603.12240

BiGain: Unified Token Compression for Joint Generation and Classification

Jiacheng Liu, Shengkun Tang, Jiacheng Cui, Dongkuan Xu, Zhiqiang Shen

AI Summary

BiGain is introduced as a training-free, plug-and-play framework for accelerating diffusion models by jointly optimizing generation quality and classification accuracy through frequency-aware token compression. It uses Laplacian-gated token merging to preserve edges and textures and Interpolate-Extrapolate KV Downsampling to conserve attention precision. Experiments across various datasets and diffusion backbones demonstrate that BiGain improves the speed-accuracy trade-off for diffusion-based classification while maintaining or enhancing generation quality, achieving up to a 7.15% increase in classification accuracy and a 0.34 improvement in FID on ImageNet-1K with 70% token merging on Stable Diffusion 2.0.

Key Contribution

Token compression in diffusion models no longer has to sacrifice classification accuracy for faster generation – BiGain boosts both.

Abstract

Acceleration methods for diffusion models (e.g., token merging or downsampling) typically optimize synthesis quality under reduced compute, yet often ignore discriminative capacity. We revisit token compression with a joint objective and present BiGain, a training-free, plug-and-play framework that preserves generation quality while improving classification in accelerated diffusion models. Our key insight is frequency separation: mapping feature-space signals into a frequency-aware representation disentangles fine detail from global semantics, enabling compression that respects both generative fidelity and discriminative utility. BiGain reflects this principle with two frequency-aware operators: (1) Laplacian-gated token merging, which encourages merges among spectrally smooth tokens while discouraging merges of high-contrast tokens, thereby retaining edges and textures; and (2) Interpolate-Extrapolate KV Downsampling, which downsamples keys/values via a controllable interextrapolation between nearest and average pooling while keeping queries intact, thereby conserving attention precision. Across DiT- and U-Net-based backbones and ImageNet-1K, ImageNet-100, Oxford-IIIT Pets, and COCO-2017, our operators consistently improve the speed-accuracy trade-off for diffusion-based classification, while maintaining or enhancing generation quality under comparable acceleration. For instance, on ImageNet-1K, with 70% token merging on Stable Diffusion 2.0, BiGain increases classification accuracy by 7.15% while improving FID by 0.34 (1.85%). Our analyses indicate that balanced spectral retention, preserving high-frequency detail and low/mid-frequency semantics, is a reliable design rule for token compression in diffusion models. To our knowledge, BiGain is the first framework to jointly study and advance both generation and classification under accelerated diffusion, supporting lower-cost deployment.

Architecture Design (Transformers, SSMs, MoE)Computer Vision Inference & Quantization Training Efficiency & Optimization

Citation Metrics

Citations0

Influential citations0

References30

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

BiGain: Unified Token Compression for Joint Generation and Classification

Related Papers