Search papers, labs, and topics across Lattice.
The paper analyzes concept-level inconsistencies within the Derm7pt dermoscopy dataset using rough set theory, revealing that 16.4% of concept profiles are inconsistent, limiting the theoretical accuracy of Concept Bottleneck Models (CBMs). They then create a fully consistent subset, Derm7pt+, by removing boundary-region images and evaluate CBM performance across various backbone architectures, establishing new baselines for concept-consistent CBM evaluation. Results show that EfficientNet-B5 and B7 achieve the best label F1 scores and concept accuracy on the filtered dataset under symmetric and asymmetric filtering, respectively.
A surprising 30% of images in the Derm7pt dermoscopy dataset have conflicting concept profiles, imposing a hard accuracy ceiling of 92.1% on Concept Bottleneck Models.
Concept Bottleneck Models (CBMs) route predictions exclusively through a clinically grounded concept layer, binding interpretability to concept-label consistency. When a dataset contains concept-level inconsistencies, identical concept profiles mapped to conflicting diagnosis labels create an unresolvable bottleneck that imposes a hard ceiling on achievable accuracy. In this paper, we apply rough set theory to the Derm7pt dermoscopy benchmark and characterize the full extent and clinical structure of this inconsistency. Among 305 unique concept profiles formed by the 7 dermoscopic criteria of the 7-point melanoma checklist, 50 (16.4%) are inconsistent, spanning 306 images (30.3% of the dataset). This yields a theoretical accuracy ceiling of 92.1%, independent of backbone architecture or training strategy for CBMs that exclusively operate with hard concepts. In addition, we characterize the conflict-severity distribution, identify the clinical features most responsible for boundary ambiguity, and evaluate two filtering strategies with quantified effects on dataset composition and CBM interpretability. Symmetric removal of all boundary-region images yields Derm7pt+, a fully consistent benchmark subset of 705 images with perfect quality of classification and no hard accuracy ceiling. Building on this filtered dataset, we present a hard CBM evaluated across 19 backbone architectures from the EfficientNet, DenseNet, ResNet, and Wide ResNet families. Under symmetric filtering, explored for completeness, EfficientNet-B5 achieves the best label F1 score (0.85) and label accuracy (0.90) on the held-out test set, with a concept accuracy of 0.70. Under asymmetric filtering, EfficientNet-B7 leads across all four metrics, reaching a label F1 score of 0.82 and concept accuracy of 0.70. These results establish reproducible baselines for concept-consistent CBM evaluation on dermoscopic data.