Search papers, labs, and topics across Lattice.
BiGain is introduced as a training-free, plug-and-play framework for accelerating diffusion models by jointly optimizing generation quality and classification accuracy through frequency-aware token compression. It uses Laplacian-gated token merging to preserve edges and textures and Interpolate-Extrapolate KV Downsampling to conserve attention precision. Experiments across various datasets and diffusion backbones demonstrate that BiGain improves the speed-accuracy trade-off for diffusion-based classification while maintaining or enhancing generation quality, achieving up to a 7.15% increase in classification accuracy and a 0.34 improvement in FID on ImageNet-1K with 70% token merging on Stable Diffusion 2.0.
Token compression in diffusion models no longer has to sacrifice classification accuracy for faster generation – BiGain boosts both.
Acceleration methods for diffusion models (e.g., token merging or downsampling) typically optimize synthesis quality under reduced compute, yet often ignore discriminative capacity. We revisit token compression with a joint objective and present BiGain, a training-free, plug-and-play framework that preserves generation quality while improving classification in accelerated diffusion models. Our key insight is frequency separation: mapping feature-space signals into a frequency-aware representation disentangles fine detail from global semantics, enabling compression that respects both generative fidelity and discriminative utility. BiGain reflects this principle with two frequency-aware operators: (1) Laplacian-gated token merging, which encourages merges among spectrally smooth tokens while discouraging merges of high-contrast tokens, thereby retaining edges and textures; and (2) Interpolate-Extrapolate KV Downsampling, which downsamples keys/values via a controllable interextrapolation between nearest and average pooling while keeping queries intact, thereby conserving attention precision. Across DiT- and U-Net-based backbones and ImageNet-1K, ImageNet-100, Oxford-IIIT Pets, and COCO-2017, our operators consistently improve the speed-accuracy trade-off for diffusion-based classification, while maintaining or enhancing generation quality under comparable acceleration. For instance, on ImageNet-1K, with 70% token merging on Stable Diffusion 2.0, BiGain increases classification accuracy by 7.15% while improving FID by 0.34 (1.85%). Our analyses indicate that balanced spectral retention, preserving high-frequency detail and low/mid-frequency semantics, is a reliable design rule for token compression in diffusion models. To our knowledge, BiGain is the first framework to jointly study and advance both generation and classification under accelerated diffusion, supporting lower-cost deployment.