Search papers, labs, and topics across Lattice.
This paper investigates the role of spatial scale in deep-feature-based perceptual similarity models for Image Quality Assessment (IQA). They introduce MSDS, a multiscale extension of DeepSSIM, which computes DeepSSIM independently across pyramid levels and fuses the resulting scores with learnable weights. Experiments on benchmark datasets show that MSDS consistently and significantly outperforms the single-scale baseline, demonstrating the importance of spatial scale in deep perceptual similarity.
Turns out, you can get significantly better image quality assessment just by intelligently combining deep features across different spatial scales.
Deep-feature-based perceptual similarity models have demonstrated strong alignment with human visual perception in Image Quality Assessment (IQA). However, most existing approaches operate at a single spatial scale, implicitly assuming that structural similarity at a fixed resolution is sufficient. The role of spatial scale in deep-feature similarity modeling thus remains insufficiently understood. In this letter, we isolate spatial scale as an independent factor using a minimal multiscale extension of DeepSSIM, referred to as Deep Structural Similarity with Multiscale Representation (MSDS). The proposed framework decouples deep feature representation from cross-scale integration by computing DeepSSIM independently across pyramid levels and fusing the resulting scores with a lightweight set of learnable global weights. Experiments on multiple benchmark datasets demonstrate consistent and statistically significant improvements over the single-scale baseline, while introducing negligible additional complexity. The results empirically confirm spatial scale as a non-negligible factor in deep perceptual similarity, isolated here via a minimal testbed.