Feb 23, 2026arXiv:2602.20068

The Invisible Gorilla Effect in Out-of-distribution Detection

Harry Anthony, Ziyun Liang, Hermione Warr, Konstantinos Kamnitsas

AI Summary

This paper identifies and characterizes a previously unreported bias in out-of-distribution (OOD) detection, termed the "Invisible Gorilla Effect," where detection performance is significantly influenced by the visual similarity between the artefact and the model's region of interest (ROI). Through experiments on skin lesion classification and other benchmarks, the authors demonstrate that OOD detection performance drops when the artefact's visual features, such as color, differ from the ROI. The study involves annotating 11,355 images with color-based artefacts and evaluating 40 OOD detection methods across 7 benchmarks, revealing a critical failure mode in current OOD detection techniques.

Key Contribution

OOD detectors can be easily fooled: detection rates plummet when out-of-distribution artifacts have dissimilar colors to the model's region of interest.

Abstract

Deep Neural Networks achieve high performance in vision tasks by learning features from regions of interest (ROI) within images, but their performance degrades when deployed on out-of-distribution (OOD) data that differs from training data. This challenge has led to OOD detection methods that aim to identify and reject unreliable predictions. Although prior work shows that OOD detection performance varies by artefact type, the underlying causes remain underexplored. To this end, we identify a previously unreported bias in OOD detection: for hard-to-detect artefacts (near-OOD), detection performance typically improves when the artefact shares visual similarity (e.g. colour) with the model's ROI and drops when it does not - a phenomenon we term the Invisible Gorilla Effect. For example, in a skin lesion classifier with red lesion ROI, we show the method Mahalanobis Score achieves a 31.5% higher AUROC when detecting OOD red ink (similar to ROI) compared to black ink (dissimilar) annotations. We annotated artefacts by colour in 11,355 images from three public datasets (e.g. ISIC) and generated colour-swapped counterfactuals to rule out dataset bias. We then evaluated 40 OOD methods across 7 benchmarks and found significant performance drops for most methods when artefacts differed from the ROI. Our findings highlight an overlooked failure mode in OOD detection and provide guidance for more robust detectors. Code and annotations are available at: https://github.com/HarryAnthony/Invisible_Gorilla_Effect.

Computer Vision Red-Teaming & Adversarial Robustness

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

The Invisible Gorilla Effect in Out-of-distribution Detection

Related Papers