Mar 30, 2026arXiv:2603.28363

SEA: Evaluating Sketch Abstraction Efficiency via Element-level Commonsense Visual Question Answering

Jiho Park, Sieun Choi, Jaeyoon Seo, Minho Sohn, Yeana Kim, Jihie Kim

AI Summary

This paper introduces SEA, a novel reference-free metric for evaluating the abstraction efficiency of sketches by assessing how well they represent class-defining visual elements derived from commonsense knowledge. SEA uses a visual question answering model to determine the presence of these elements in sketches, quantifying semantic retention under visual economy. They also present CommonSketch, a new semantically annotated sketch dataset with 23,100 sketches across 300 classes, demonstrating SEA's alignment with human judgments and its utility as a benchmark for element-level sketch understanding.

Key Contribution

Finally, a way to measure how efficiently a sketch conveys meaning, moving beyond simple recognition accuracy.

Abstract

A sketch is a distilled form of visual abstraction that conveys core concepts through simplified yet purposeful strokes while omitting extraneous detail. Despite its expressive power, quantifying the efficiency of semantic abstraction in sketches remains challenging. Existing evaluation methods that rely on reference images, low-level visual features, or recognition accuracy do not capture abstraction, the defining property of sketches. To address these limitations, we introduce SEA (Sketch Evaluation metric for Abstraction efficiency), a reference-free metric that assesses how economically a sketch represents class-defining visual elements while preserving semantic recognizability. These elements are derived per class from commonsense knowledge about features typically depicted in sketches. SEA leverages a visual question answering model to determine the presence of each element and returns a quantitative score that reflects semantic retention under visual economy. To support this metric, we present CommonSketch, the first semantically annotated sketch dataset, comprising 23,100 human-drawn sketches across 300 classes, each paired with a caption and element-level annotations. Experiments show that SEA aligns closely with human judgments and reliably discriminates levels of abstraction efficiency, while CommonSketch serves as a benchmark providing systematic evaluation of element-level sketch understanding across various vision-language models.

Computer Vision Eval Frameworks & Benchmarks Multimodal Models

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

SEA: Evaluating Sketch Abstraction Efficiency via Element-level Commonsense Visual Question Answering

Related Papers