Universidad de TalcaApr 21, 2026arXiv:2604.19323

Concept Inconsistency in Dermoscopic Concept Bottleneck Models: A Rough-Set Analysis of the Derm7pt Dataset

Gonzalo Nápoles, Isel Grau, Yamisleydi Salgueiro

AI Summary

The paper analyzes concept-level inconsistencies within the Derm7pt dermoscopy dataset using rough set theory, revealing that 16.4% of concept profiles are inconsistent, limiting the theoretical accuracy of Concept Bottleneck Models (CBMs). They then create a fully consistent subset, Derm7pt+, by removing boundary-region images and evaluate CBM performance across various backbone architectures, establishing new baselines for concept-consistent CBM evaluation. Results show that EfficientNet-B5 and B7 achieve the best label F1 scores and concept accuracy on the filtered dataset under symmetric and asymmetric filtering, respectively.

Key Contribution

A surprising 30% of images in the Derm7pt dermoscopy dataset have conflicting concept profiles, imposing a hard accuracy ceiling of 92.1% on Concept Bottleneck Models.

Abstract

Concept Bottleneck Models (CBMs) route predictions exclusively through a clinically grounded concept layer, binding interpretability to concept-label consistency. When a dataset contains concept-level inconsistencies, identical concept profiles mapped to conflicting diagnosis labels create an unresolvable bottleneck that imposes a hard ceiling on achievable accuracy. In this paper, we apply rough set theory to the Derm7pt dermoscopy benchmark and characterize the full extent and clinical structure of this inconsistency. Among 305 unique concept profiles formed by the 7 dermoscopic criteria of the 7-point melanoma checklist, 50 (16.4%) are inconsistent, spanning 306 images (30.3% of the dataset). This yields a theoretical accuracy ceiling of 92.1%, independent of backbone architecture or training strategy for CBMs that exclusively operate with hard concepts. In addition, we characterize the conflict-severity distribution, identify the clinical features most responsible for boundary ambiguity, and evaluate two filtering strategies with quantified effects on dataset composition and CBM interpretability. Symmetric removal of all boundary-region images yields Derm7pt+, a fully consistent benchmark subset of 705 images with perfect quality of classification and no hard accuracy ceiling. Building on this filtered dataset, we present a hard CBM evaluated across 19 backbone architectures from the EfficientNet, DenseNet, ResNet, and Wide ResNet families. Under symmetric filtering, explored for completeness, EfficientNet-B5 achieves the best label F1 score (0.85) and label accuracy (0.90) on the held-out test set, with a concept accuracy of 0.70. Under asymmetric filtering, EfficientNet-B7 leads across all four metrics, reaching a label F1 score of 0.82 and concept accuracy of 0.70. These results establish reproducible baselines for concept-consistent CBM evaluation on dermoscopic data.

Computer Vision Interpretability & Mechanistic Interp

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Concept Inconsistency in Dermoscopic Concept Bottleneck Models: A Rough-Set Analysis of the Derm7pt Dataset

Related Papers