Search papers, labs, and topics across Lattice.
The paper introduces Semantic Similarity as a new evaluation task for low-level image processing, addressing the limitations of fidelity-based metrics in the context of deep learning and generative models. They formalize image semantics using entities and relations, and propose the Triplet-based Semantic Similarity Score (T3S). T3S outperforms existing metrics by combining semantic entity extraction, foreground-background disentanglement, and open-world class/relation modeling, demonstrating its ability to better reflect semantic changes under various degradations.
Fidelity metrics can be fooled: evaluating low-level image processing requires a new approach that directly measures preservation of semantic content.
Low-level image processing has long been evaluated mainly from the perspective of visual fidelity. However, with the rise of deep learning and generative models, processed images may preserve perceptual quality while altering semantic content, making conventional Image Quality Assessment (IQA) insufficient for semantic-level assessment. In this paper, we formalize \textit{Semantic Similarity} as a new evaluation task for low-level image processing, aimed at measuring whether semantic content is preserved after processing. We further present a structured formulation of image semantics based on semantic entities and their relations, and discuss the desired properties and constraints of a valid semantic similarity index. Based on this formulation, we propose Triplet-based Semantic Similarity Score (T3S), which models image semantics through foreground entities, background entities, and relations. T3S combines semantic entity extraction, foreground-background disentanglement, and open-world class/relation modeling. Experiments on COCO and SPA-Data show that T3S consistently outperforms existing fidelity-oriented metrics and representative semantic-level baselines, while better reflecting progressive semantic changes under diverse degradations. These results highlight the importance of semantic assessment in modern low-level vision.